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Anew look 
for Nature 


The journal has been redesigned for clearer 
research communication in the digital age. 


romtoday, Nature willlook alittle different. We are 
unveiling a redesign that will, we hope, help us fulfil 
our mission to serve researchers and disseminate 
scientific knowledge worldwide. 

This design has been in development for well 
over a year, and isa much-needed update that helps us — in 
our 150th year — to communicate science with fresh clarity 
and style. We love it, and we hope that you, our readers, 
do, too. 

Nature has had anumber of design transformations over 
its history — but they were all based on one assumption: 
that our content would be accessed through the medium 
of staticink printed ona physical page. Not any more. That’s 
why we have developed a design that is suited to digital 
platforms — where the vast majority of readers now find 
us — while at the same time producing a clear and engaging 
printed edition. 

In surveys and interviews, readers told us that our text 
can be hard to read; and that research articles increasingly 
need to do justice to complex data sets. We knew that it 
would be challenging to come up witha compelling design 
that meets these needs and also works across formats, but 
working with renowned editorial designer Mark Porter, we 
listened, we experimented and we have now acted. 

One of the first things you might notice is that the Nature 
logo has changed — this will be the 11th iteration. It’s afresh 
take on the nature-with-a-small-n that we’ve used for the 
past half-century. But it’s not just the logo that is new — all 
our text is nowin acustom typeface called Harding, named 
in memory of Anita Harding, an inspirational professor 
at London’s Institute of Neurology who made important 
contributions to neurogenetics before her death at the age 
of just 42. 

Working with designers and typographers at Commercial 
Type, we spent months crafting the typeface to integrate 
it into Nature’s overall design language, inspired by the 
mid-century Swiss modernist school of rational design. 
This design school — sometimes called the internationalist 
school — emerged in response to nationalist design trends 
before and during the Second World War. It promoted the 
idea that graphic design should be based ona mathematical 
grid, allowing designers to arrange type and images with 
asemblance of order, as Nature’s creative director Kelly 
Krause explains in this issue (see page 476). 

The result is a printed journal with text that should be 
easier to read. We have also adjusted some of the organiza- 
tion and labelling to help readers navigate between sections. 
From now on, all our research content will also be published 
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in the ‘Article’ format; the shorter, ‘Letter’ format has been 
retired. This will give all the research we publish equal prom- 
inence and adds to the extended-data section we created in 
2013 to integrate supporting data sets into online papers. 

We have also introduced a new back-page article, called 
‘Where I work’, which profiles researchers, and those con- 
nected to research, in the places where they study, work 
and think. Through a combination of striking photography 
and first-person narrative, our goal is to provide a glimpse 
into the lives of people of all ages from around the world. 
Fans of our Futures articles should not mourn: the journal’s 
science-fiction series continues online. 

The redesign process is not over, and you can expect 
to see more digital changes over the coming year, along 
with new print and digital design principles for all Nature- 
branded journals. 

Nothing is more important to Nature than communi- 
cating science with authenticity, accuracy and clarity. We 
hope the new design does this with a dash of style and with 
imagination, too. Please tell us what you think. As always, 
we would welcome your ideas and suggestions for further 
improvements. 


Precarious 
Supremacy 


Quantum computing will suffer if 
claims of supremacy are overhyped. 


esearchers led by Google’s AlQuantumteam have 

demonstrated ‘quantum supremacy’ by creat- 

ing achip that performed acomputational task 

faster thana classical computer. As we report on 

page 461, an achievement that the researchers 
say would have taken the world’s fastest supercomputer 
10,000 years was completed in under 3 minutes (F. Arute 
et al. Nature 574, 505-510; 2019). 

As the world digests this achievement — including the 
claim that some quantum computational tasks are beyond 
supercomputers — it is too early to say whether supremacy 
represents a new dawn for information technology. It could 
be that we are looking at quantum computing’s Kitty Hawk 
moment — areference to the many decades between the 
Wright brothers first flight at Kitty Hawk in North Carolina, 
in 1903 and the advent of the jet age (page 487). At the very 
least, quantum computers as a routine part of life are likely 
to be decades or more into the future. 

Still, this achievement in science and engineering should 
certainly not be underestimated. Research teams around 
the world have been working intensely to unleash the 
processing power of quantum phenomena: these include 
superposition, in which particles seem to have multiple 
states until they are observed; and entanglement, which 
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describes how the properties of quantum systems can be 
tied together. If these behaviours can be more precisely 
controlled, they would generate exponential gains in pro- 
cessing power for certain tasks compared with today’s 
supercomputers. And that is what the team at Google has 
achieved. 

Its chip, known as Sycamore, comprises just 53 
individually controllable superconducting quantum bits 
(qubits), the basic building blocks of quantum comput- 
ers. The team chose to calculate the outputs of arandom 
quantum circuit — rather like a quantum random number 
generator. This is not an easy problem, and the Summit 
supercomputer at Oak Ridge National Laboratory in Tennes- 
see, the world’s most powerful machine in its class, would 
have taken 10 millennia to complete it, the researchers say. 
Sycamore needed only 200 seconds. 

Summit can call on morethan 9,000 of the most powerful 
central processing units (8 billion transistors in each) and 
nearly 28,000 graphics processors (21 billion transistors 
each). With such raw computing power outgunned by just 
53 qubits, it’s understandable that quantum computers are 
generating such excitement and optimism. 

But this demonstration of quantum supremacy is 
extremely limited. There’s a vast gap to be bridged before 
quantum computers can do more meaningful things — such 
as simulating the properties of materials or chemical reac- 
tions, or accelerating drug discovery. 

For one thing, quantum computers are highly sensitive 
to environmental noise — including everyday phenomena 
such as temperature changes and electromagnetic fields. 
And researchers are a long way from being able to design 
out these and other obstacles. 

Instead of proceeding with caution, a quantum gold 
rush is under way, with investors joining governments and 
companies to pour large sums of money into developing 
quantum technologies. Unrealistic expectations are being 
fuelled that powerful general-purpose quantum computers 
could soon be on the horizon. Such misguided optimism 
could be dangerous for the future of this still-fledgling field. 

Such a landscape has created a flourishing network of 
quantum technologists, but those providing the funding 
will eventually seek a return on investment. There are 
already concerns that some firms are over-promising, 
which is why over-hyping this landmark demonstration 
could raise expectations further. Researchers fear that, if 
quantum computers fail to deliver anything useful soon, a 
‘quantum winter’ could descend in which research progress 
slows, investment stalls and disillusion sets in. 

The powerful processors that underpin today’s devices 
such as smartphones were developed from decades of 
sustained investment — often public investment — in 
research. Quantum processors will similarly require what 
innovation economists call ‘patient capital’. 

Too often in the history of science and technology, 
expectations are raised, only for reality to get in the way. 
Quantum computers are still near the start of along and 
unpredictable journey. As they encounter challenges and 
costs start to mount, researchers must know that they can 
reach their destination. 
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Young universities 
show leadership 


Thriving new institutions can share lessons 
in building research and publishing capacity. 


his week, an analysis from Nature Research’s 

Nature Index team (see supplement, page S53) 

looks at the contribution of ‘young’ universities 

to research publishing in the natural sciences. 

Young universities — those aged 50 years or 
less — are quickly establishing a reputation in teaching 
and research. However, in Africa, more needs to be done 
to build their capacity. 

The analysis looked at the contributions of authors from 
100 young universities in 2018 to 82 journals in the natural 
sciences. The journals were chosen by an independent panel 
of researchers, and span the life sciences, physical sciences, 
chemistry, and Earth and environmental sciences. Author 
contribution was recorded in several ways, including the 
total number of articles published by an institution’s affil- 
iated researchers, as well as the share of each institution’s 
contribution to those articles. 

In most assessments of research-intensive universities, 
those in the United States and Europe tend to dominate. But 
among the leading 100 younger universities, there is much 
more of an east-west mix, spread across China (11 univer- 
sities), Germany (11), India (10), Australia (9), South Korea 
(8) and the United States (8). 

Authors from the University of the Chinese Academy of 
Sciences in Beijing are by far the most prolific, contributing 
1,816 articles to the listed journals. That is ona par with the 
number of articles from older institutions in the United 
States, Europe and Japan, and substantially ahead of sec- 
ond-placed Nanyang Technological University in Singapore 
(569 articles). 

The absence of institutions from Africa in the analysis 
is partly because many authors there publish in journals 
that the index does not capture, including in fields such 
as agriculture, water resources, primary health care and 
education. But a comparative lack of financial resources for 
researchers in the natural sciences is also a factor. 

In the spirit of south-south collaboration, universities 
recognized for their publishing in the natural sciences have 
an opportunity to support those in need of a boost. Many of 
the young universities assessed in the index are in countries 
that, even one generation ago, were at an earlier stage of 
development. They will have valuable lessons to pass on 
in building research and publishing capacity. 

China’s fast-expanding universities are already doing this 
through the Belt and Road Initiative. Rising institutions in 
other countries, too, will find mutual benefits by sharing 
experiences and working with research partners in Africa 
and elsewhere in the global south. 
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CAPES, BRAZIL 


A personal take on science and society 


World view 


By Carlos 
A. Nobre 


Tosave Brazil’s rainforest, 


boostits science 


My country’s government is 
squeezing research and pushing 
the rainforest to the brink. 


his August, the skies outside my office in Sao 
Paulo, Brazil, were filled with smoke from the 
fires in the Amazon rainforest. I recalled the 
forest that I saw in the 1970s as a teenager on 
family holidays — its beauty, powerful rivers, 
Indigenous peoples and continuous rains — and thought of 
how much my country (and the world) could lose. Bishops 
throughout the Amazon region gathered in the Vatican this 
month to pray and strategize on behalf of “integral ecology, 
the cry of the Earth and the poor”. Reversing the situation 
in Brazil is essential for the “good living” the synod seeks. 

The smoke has disappeared from my city, for now, but 
Brazil’s rainforest has never been in greater peril. Nor has 
its science — so badly needed to buttress sustainable agri- 
business and invent an economy centred on an intact forest. 
Last month, thousands of graduate students learnt their 
scholarships would not be renewed. 

Since he came into power inJanuary, Brazil’s President Jair 
Bolsonaro has relaxed the enforcement of laws that prohibit 
most of the clearing and burning of the Amazon. Analyses 
of satellite imagery show that this year’s dry season brought 
nearly twice as many fires as last year’s, and that the flames 
were bright on satellite images — as expected from burning 
a large amount of biomass from recently clearcut forest. 
Initial estimates indicate that more than 90% of these fires 
were illegal. Rather than face such alarming data, Bolsonaro 
fired the director of the agency that monitors deforestation. 

These are huge setbacks. From 2005 to 2014, Brazil 
reduced its annual rate of deforestation by about 75%. Over 
the same timeframe, the value of agricultural production 
increased by about 200%. Science and technology fostered 
this progress. Satellite-based monitoring developed by 
Brazil’s National Institute of Space Research provided daily 
alerts of deforestation. This facilitated effective law enforce- 
ment and incentive programmes for protecting the forest. 

My country’s example had served as inspiration for 
others. Without Brazil as a model, I expect deforestation 
will accelerate across the Amazon. It is already increasing in 
Colombia, Peru and Bolivia. Twelve years ago, my colleagues 
and I calculated that, ifthe rainforest’s area shrunk by 40% 
of its expanse in the 1970s, it could not grow back — and 
as much as 70% of the original forest could transform to 
drier, hotter savanna (G. Sampaio etal. Geophys. Res. Lett. 34, 
L17709; 2007). With rising global temperatures, deforesta- 
tion, fires and concomitant dryness, that margin has shrunk. 

Across most of the Amazon basin, the dry season is already 
several weeks longer, particularly over deforested areas; 
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these return less moisture to the atmosphere than the rain- 
forest does overall. If deforestation continues, the rainfor- 
est could collapse. Many livelihoods will be impossible to 
maintain. Less rain will fall, temperatures will rise and tens of 
thousands of species will be lost along with the forest’s power 
to absorb as muchas 5% of the world’s carbon emissions. 

Deforestation to increase agricultural lands is no 
longer necessary. The Brazilian Amazon has an estimated 
17 million hectares of degraded and non-productive lands 
that could be restored and used for sustainable agriculture, 
including new forest products. Land already in production 
has the capacity to raise yields severalfold. Innovative tech- 
nologies and smart management could deliver a ‘bio-econ- 
omy’ based on the sustainable extraction of materials for 
goods, ranging from pharmaceuticals to foods (acai is the 
most famous example), cosmetics and other materials. 
With effective monitoring and enforcement, these ‘bio-in- 
dustries’ can boost the economy, respect social rights and 
traditional peoples and protect the Amazon’s ecosystems. 

All this will be impossible if Brazil’s scientific capacity 
withers. In1998, fewer than 4,000 PhD students graduated. 
Last year, there were more than 22,000. Government funds 
for science grew steadily from the 1980s, but declined under 
the economic recession of 2015. Unlike countries such as 
South Korea, China and Germany that invested more in sci- 
ence to build resilience to economic turbulence, Brazilian 
science has faced severe budget cuts year after year. 

Now, instead of watching younger compatriots build our 
scientific establishment, I see them all but forced to leave. 
The governmentis also changing the direction of research. It 
aims to replace the forest with livestock farming, monocul- 
ture crops, mining operations and huge hydropower plants. 

For much of my career, I have worked to reconcile 
apparently opposing views of land use: some people advo- 
cate setting aside large tracts for conservation, and others 
champion ‘resource-intensive development’ based on agri- 
culture, livestock, energy and mining. I feel we must instead 
focus on building a different, sustainable paradigm in which 
the forest contributes to well-being. Beyond the progress 
that has already happened in slowing deforestation and 
boosting sustainable agribusiness, I see much promise in 
an initiative called Amazonia 4.0, after the Fourth Indus- 
trial Revolution of biotechnologies, digital technologies 
and material science. To realize that promise, we will need 
scientists and engineers more than ever. 

I fear both Brazil’s science and the Amazon rainforest 
are approaching a tipping point — from which recovery is 
probably impossible. To avoid it, scientists in and outside 
Brazil should protest vigorously against the anti-science 
movement and speak clearly to society about howimportant 
science and the Amazon are for human well-being and the 
sustainability of the planet. 
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The world this week 


Newsin brief 


RESEARCH CONCERNS 
MOUNT AS BREXIT 
DRAMAUNFOLDS 


Brexit uncertainty is paralysing 
UK politics. Itis also taking a 

toll onthe nation’s science by 
making researchers unsure 
about their future role in 
European research, according to 
the Royal Society in London. 

The society says that the 
United Kingdom’s annual share 
of research funding from the 
European Union’s flagship 
Horizon 2020 programme fell by 
almost one-third between 2015 
and 2018. This is because UK 
applications for Horizon 2020 
grants dropped by 39% owing 
toalack of confidence over the 
country’s future participation in 
European research, the society 
says in areport released on 
16 October. 

As aresult, Horizon 2020 
funding for UK science dropped 
by around €500 million (US$560 
million), said Royal Society 
president Venki Ramakrishnan in 
a statement. He added that there 
had also been a large drop in the 
number of leading researchers 
who want to come to the United 
Kingdom. “People do not want to 
gamble with their careers, when 
they have no sense of whether 
the UK will be willing and able 
to maintain its global scientific 
leadership.” 

The report shows that last 
year, the number of non-UK 
scientists coming to British 
institutions through the 
prestigious Marie Sktodowska 
Curie fellowship scheme, which 
is part of Horizon 2020, was 35% 
lower than in 2015. 

As Nature went to press, 
the UK government, which 
is pushing to leave the EU on 
31 October, had agreed a deal 
onthe terms of its withdrawal, 
but still lacked approval from 
Parliament to proceed. 
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Female scientists in Australia were less likely than their 
male counterparts to win a major new type of medical- 
research grant, despite an overhaul that was supposed 
to address gender inequity in the country’s science 
funding. The imbalance occurred in the National 

Health and Medical Research Council (NHMRC) 
‘investigator grants’, which were awarded for the first 
time this August. It was particularly severe at the senior- 
leadership level. Only 29.4% of senior women (5 of 17) 
who applied were successful, compared with 49.3% of 
men (37 of 75). “It’s a poor message,” says Marguerite 
Evans-Galea, a co-founder of the non-profit association 
Women in STEMM Australia. Success rates, which were 
released on the NHMRC’s website, were more closely 
matched at the early- and mid-career stages, but were 
higher overall for men than for women (14.9% versus 
11.3%). Menalso received more money overall, partly 
because they won more grants than women. An NHMRC 
spokesperson says that extra funding was allocated to 
several female-led applications that weren’t earmarked 
to receive money, which reduced the gender difference. 
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CANADIAN SCIENCE 
TAKES BACK SEATIN 
ELECTION 


Canada’s Liberal party, led by 
Justin Trudeau, has won the 
most seats inthe country’s 
general election but not an 
overall majority in the House 
of Commons, according to 
projections available as Nature 
went to press. What the result 
means for research is unclear. 

Inthe lead-up to the election 
on 21 October, as the Liberals 
were running neck and neck 
with the Conservative Party, 
researchers had worried that 
government support would fall 
by the wayside regardless of 
which party won. 

With the exception of climate 
change — one of the top issues 
for voters in recent polls — 
research was largely absent from 
the election campaign. 

That contrasts with the 
general election in 2015, when 
the Trudeau-led Liberal Party 
campaigned ona promise to 
reverse policies by the previous 
government that were widely 
seen as anti-science — and won. 

Since then, the Liberal 
government has boosted 
research funding, freed 
government researchers to 
speak to the public without 
first getting permission from 
the administration, and raised 
the profile of environmental 
concerns such as climate change 
and ocean conservation. 

But many researchers felt 
that the government had begun 
to rest onits laurels when it 
came to science. “There is some 
concern that the government 
feels like they’re done. They’ve 
checked the box and they’re 
moving on,” says Katie Gibbs, 
executive director of the 
campaign group Evidence for 
Democracy in Ottawa. 

The Liberals are now expected 
to forma coalition government. 
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US PAIR COMPLETES 
HISTORIC ALL-FEMALE 
SPACEWALK 


NASA astronauts Christina Koch 
and Jessica Meir performed the 
first all-female spacewalk on 

18 October, to repair a faulty 
battery unit on the International 
Space Station. The roughly 
seven-hour spacewalk was the 
fourth for Koch (above right), 
an electrical engineer who is 
ontrack to set a record for the 
longest single spaceflight by a 
woman; if all goes to plan, she 
will spend 328 days in space 
before returning to Earthin 
February. Meir (above left), 

a biologist, had never before 
attempted a spacewalk. 

“This is really just us doing our 
jobs,” Meir said during the walk, 
which NASA broadcasted live on 
the Internet. 

During the event, the 
astronauts received a call 
from US President Donald 
Trump. “The job that you dois 
incredible,” he told Koch and 
Meir. “I’m thrilled to be speaking 
with two brave American 
astronauts making history.” 

The two US astronauts are the 
14th and 15th women to walk 
in space. Russian cosmonaut 
Svetlana Savitskaya was the 
first, in 1984, followed by 
14 Americans. 


DETECT CANCER 
WHENIT'S SMALL 
AND TREATABLE 


Catching cancer early is the 
focus of anew transatlantic 
research collaboration. 

The International Alliance 
for Cancer Early Detection, 
announced on 21 October, will 
receive up to £40 million 
(US$52 million) over five 
years from the charity Cancer 
Research UK, with the possibility 
of an additional £15 million from 
Stanford University in California 
and the Oregon Health & Science 
University Knight Cancer 
Institute in Portland. 

The collaborators hope to take 
advantage of recent advances 
in cancer genetics and imaging. 
Databases are swelling with 
tumour DNA sequences, and 
researchers have begun to 
turn their sights to sequencing 
precancerous growths in an 
effort to learn which mutations 
tip some of them over into 
malignancy. Clinicians can now 
detect ever-smaller tumours, 
and metabolic changes that can 
be hallmarks of cancer, without 
surgery or removing tissue. 

Early detection could improve 
cancer treatment: five-year 
survival rates for six types of 
cancer are more than three 
times higher when the cancer is 
diagnosed at its earliest stage, 
compared with survival if the 
cancer is caught only after it has 
become more advanced and 
has started to spread to other 
locations in the body. 
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EBOLA OUTBREAK 
IN AFRICA 
SLOWS DOWN 


The Ebola outbreak in the 
eastern Democratic Republic 

of the Congo (DRC) is finally 
waning, the World Health 
Organization (WHO) said on 

18 October. Fifty people were 
diagnosed with Ebola in the 
DRC between 25 September 

and 15 October, the WHO said. 
At the outbreak’s peak in April, 
roughly 300 new infections were 
reported in three weeks. Almost 
3,250 people have been infected 
since the outbreak began in 
August 2018, and more than 
2,150 have died. 

The drop in infections is nota 
reason to relax efforts to contain 
the virus, WHO director-general 
Tedros Adhanom Ghebreyesus 
told reporters. “We must treat 
every case as if it is the first since 
every case has the potential to 
spark anew outbreak,” he said. 

There was more good news on 
18 October, when the European 
Medicines Agency (EMA) 
recommended that the European 
Commission (EC) approve an 
Ebola vaccine produced by 
the pharmaceutical company 
Merck. About 240,000 people 
considered to beat risk from 
Ebola have been given this 
vaccine during the outbreak, 
but itis still considered to be 
an experimental product by 
regulators worldwide and 
cannot be marketed. The EC will 
make a decision within 10 weeks 
on whether to approve the 
vaccine for sale. 
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The world this week 


News in focus 


The Sycamore chip is composed of 54 qubits, each made of superconducting loops. 
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GOOGLE PUBLISHES 
LANDMARK QUANTUM 
SUPREMACY CLAIM 


The company says that its quantum computer is the first to performa 
calculation that would be practically impossible for a classical machine. 


By Elizabeth Gibney 


cientists at Google say that they have 
achieved quantum supremacy, along- 
awaited milestone in quantum com- 

puting. The announcement, published 

in Nature on 23 October, follows aleak 

of an early version of the paper five weeks ago, 
which Google did not comment onat the time. 
Ina world first, a team led by John Martinis, 
an experimental physicist at the University of 
California, Santa Barbara, and Google in Moun- 
tain View, California, says that its quantum 
computer carried out a specific calculation that 
is beyond the practical capabilities of regular, 


‘classical’ machines (F. Arute et al. Nature 574, 
505-510; 2019). The same calculation would 
take even the best classical supercomputer 
10,000 years to complete, Google estimates. 

Quantum supremacy has long been seen as 
a milestone because it proves that quantum 
computers can outperform classical comput- 
ers, says Martinis. Although the advantage has 
now been proved only fora very specific case, 
it shows physicists that quantum mechan- 
ics works as expected when harnessed ina 
complex problem. 

“It looks like Google has given us the first 
experimental evidence that quantum speed-up 
is achievable in a real-world system,” says 
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Michelle Simmons, a quantum physicist at 
the University of New South Wales in Sydney, 
Australia. 

The feat was first reported in September by 
the Financial Times and other outlets, after 
an early version of the paper was leaked on 
the website of NASA, which collaborates with 
Google on quantum computing, before being 
quickly taken down. At that time, the company 
did not confirm that it had written the paper, 
nor would it comment on the stories. 

Although the calculation Google chose — 
checking the outputs from a quantum ran- 
dom-number generator — has limited practical 
applications, “the scientific achievement is 
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Google’s quantum computer excels at checking the outputs of a random-number generator. 


huge, assuming it stands, and I’m guessing 
it will”, says Scott Aaronson, a theoretical 
computer scientist at the University of Texas 
at Austin. 

Researchers outside Google are already 
trying to improve on the classical algorithms 
used to tackle the problem to bring down the 
10,000 year speed-up that the firm calculates. 
IBM, a rival to Google in building the world’s 
best quantum computers, reported ina pre- 
print on 21 October that the problem could be 
solved injust 2.5 days using a different classical 
technique (E. Pednault etal. preprint at https:// 
arxiv.org/abs/1910.09534; 2019). That paper 
has not been peer-reviewed. If IBM is correct, 
it would reduce Google’s feat to demonstrating 
a quantum ‘advantage’ — doing a calculation 
much faster thana classical computer, but not 
something that is beyond its reach. This would 
still be a significant landmark, says Simmons. 
“As far as I’m aware that’s the first time that’s 
been demonstrated, so that’s definitely a 
big result.” 


Quick solutions 

Quantum computers work ina fundamentally 
different way from classical machines: a clas- 
sical bit is either a1 or a O, but a quantum bit, 
or qubit, can exist in multiple states at once. 
When qubits are inextricably linked, physi- 
cists can, in theory, exploit the interference 
between their wave-like quantum states to 
perform calculations that might otherwise 
take millions of years. Physicists think that 
quantum computers might one day run revo- 
lutionary algorithms that could, for example, 
search unwieldy databases or factor large 
numbers — including, importantly, those 
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used in encryption. But those applications 
are still decades away. The more qubits are 
linked, the harder it is to maintain their fragile 
states while the device is operating. Google’s 
algorithm runs ona quantum chip composed 
of 54 qubits, each made of superconducting 
loops. But this is a tiny fraction of the one 
million qubits that could be needed for a 
general-purpose machine. 

The task Google set for its quantum 
computer is “a bit of a weird one’, says Chris- 
topher Monroe, a physicist at the University 
of Maryland in College Park. Google physi- 
cists first crafted the problem in 2016, and it 
was designed to be extremely difficult for an 
ordinary computer to solve. The team chal- 
lenged its computer, known as Sycamore, to 


“The scientific achievement 
ishuge, assuming it stands, 
and I’m guessing it will.” 


describe the likelihood of different outcomes 
froma quantum version of arandom-number 
generator. They do this by running a circuit that 
passes 53 qubits through a series of random 
operations. This generates a 53-digit string of 
1s and Os — with a total of 2° possible combi- 
nations (only 53 qubits were used because one 
of Sycamore’s 54 was broken). The process is 
so complex that the outcome is impossible to 
calculate from first principles, and is therefore 
effectively random. But owing to interference 
between qubits, some strings of numbers are 
more likely to occur than others. This is simi- 
lar to rolling a loaded die — it still produces a 
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random number, eventhough some outcomes 
are more likely than others. 

Sycamore calculated the probability 
distribution by sampling the circuit — running it 
one million times and measuring the observed 
output strings. The methodis similar to rolling 
the die to reveal its bias. Verifying the solution 
was a further challenge. To do that, the team 
compared the results with those from simu- 
lations of smaller and simpler versions of the 
circuits, which were done by classical comput- 
ers — including the Summit supercomputer at 
Oak Ridge National Laboratory in Tennessee. 
Extrapolating from these examples, the Google 
team estimates that simulating the full circuit 
would take 10,000 years even on a computer 
with one million processing units (equiva- 
lent to around 100,000 desktop computers). 
Sycamore took just 3 minutes and 20 seconds. 


Limited applications 


Monroe says that Google’s achievement might 
benefit quantum computing by attracting 
more computer scientists and engineers to 
the field. But he also warns that the news could 
create the impression that quantum computers 
are closer to mainstream practical applications 
than they really are. “The story on the street is 
‘they’ve finally beaten a regular computer: so 
here we go, two years and we'll have onein our 
house’, he says. 

Inreality, Monroe adds, scientists are yet to 
show that a programmable quantum computer 
can solve a useful task that cannot be done any 
other way, suchas by calculating the electronic 
structure of a particular molecule — a fiend- 
ish problem that requires modelling multiple 
quantum interactions. Another important 
step, says Aaronson, is demonstrating quan- 
tum supremacy inan algorithm that uses a pro- 
cess known as error correction — amethod to 
correct for noise-induced errors that would 
otherwise ruin a calculation. Physicists think 
this will be essential to getting quantum com- 
puters to function at scale. Google is working 
towards both of these milestones, says Marti- 
nis, and will reveal the results ofits experiments 
inthe coming months. 

Aaronson says that the experiment Google 
devised to demonstrate quantum suprem- 
acy might have practical applications: he has 
created a protocol to use such a calculation 
to prove to a user that the bits generated by 
a quantum random-number generator really 
are random. This could be useful, for example, 
in cryptography and some cryptocurrencies, 
whose security relies on random keys. 

Google engineers had to carry out a raft of 
improvements to their hardware to run the 
algorithm, including building new electronics 
to control the quantum circuit and devising a 
new way to connect qubits, says Martinis. “This 
is really the basis of how we’re going to scale up 
in the future. We think this basic architecture 
is the way forward,” he says. 


ERIK LUCERO 


GEORGE ROSE/GETTY 


Western Canada’s Rocky Mountains are among the geologically complex areas that Earth scientists hope to study in detail. 


EARTH SCIENTISTS 
PUSH PLAN TO MAP 
CANADA'S GEOLOGY 


A fleet of geophysical observatories would probe everything 
fromthe inner Earth to the upper atmosphere. 


By Alexandra Witze 


ometime in mid- to late November, 

geophysicist David Eaton will head 

into the forests around Fort St. John, 

Canada, and help to nestle an array of 

15 seismometers into the ground. They 
will spend their days listening for small earth- 
quakes caused by oil and gas exploration in this 
part of British Columbia. If Eaton has his way, 
the seismometers will soon be joined by hun- 
dreds more, blanketing Canada as part of an 
unprecedented quest to probe the nation’s 
geology. 

Eaton, of the University of Calgary, is leading 
ahugely ambitious effort to establish a network 
of geophysical observatories across Canada. 
The project aims to study everything from 
the inner Earth to the upper atmosphere — 
and to answer questions such as how much 


Canadians should worry about earthquakes 
and landslides, and where researchers should 
explore for lucrative mineral deposits or 
renewable energy resources. 

“We're completely aspirational and ambi- 
tious,” Eaton says. 

It’s not clear whether funding agencies 
will cough up the roughly Can$100 million 
(US$75 million) that’s needed to turn these 
ambitions into reality. But a wide-ranging 
group of scientists has come together to 
advocate for the project, which is known 
as EON-ROSE (Earth-system Observing 
Network-Réseau d’Observation du Systéme 
Terrestre). Some, including Eaton, are now 
journeying into the Canadian wilderness to 
show what EON-ROSE could do. 

EON-ROSE was inspired by a massive US 
geophysics programme called EarthScope 
that is wrapping up this year. Since 2004, the 
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project has covered the continental United 
States with a movable network of observatories 
composed of seismometers and GPS instru- 
ments. EarthScope revealed new details about 
geological hazards — such as the enormous 
Cascadia fault zone in the Pacific Northwest, 
where researchers discovered unexpected 
slow-moving quakes that might offer clues 
to how often a massive quake is likely to strike 
the region. 

That caught the attention of scientists to the 
north. “We started talking about, why can’t we 
do this in Canada?” says Roy Hyndman, a geo- 
physicist at the Pacific Geoscience Centre in 
Sidney, British Columbia, who led many of the 
early discussions. Like EarthScope, EON-ROSE 
aims to install geophysical observatories ina 
grid that would move around the country. 

The project's backers want to study Canada’s 
geologically active western mountains, along 
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News in focus 


with the ancient rocks in its centre that date 
back roughly four billion years, and unstable 
regions in the east. Compared with the United 
States, “Canada covers an even wider portion of 
Earth history that could be investigated”, says 
Andy Frassetto, a seismologist at the Incorpo- 
rated Research Institutions for Seismology in 
Washington DC. 

EON-ROSE organizers have begun the project 
witha series of smaller studies, such as Eaton’s, 
while they seek full funding. Another study took 
place this summer, when researchers from the 
Geological Survey of Canada and Geoscience 
BC descended on Mount Meager, which is in 
southern British Columbia and is Canada’s 
most recently active big volcano. Their goal 
was to explore whether its volcanic warmth 
— which heats groundwater up to 240 °C — 
could be tapped for geothermal energy. 

In July, geologists travelled around the 
mountain in helicopters to install instruments 
similar to those envisioned for EON-ROSE. The 
researchers are crunching the preliminary 
data now, aiming to see where permeable 
rocks channel Mount Meager’s volcanic heat 
towards the surface. Future studies in other 
parts of Canada could help geologists to find 
new sources of geothermal energy — suchasin 
the remote Arctic, where residents often rely 
onimported diesel, says Stephen Grasby, a geo- 
chemist at the Geological Survey of Canadain 
Calgary who led the work. 


Hidden treasure 


EON-ROSE also aims to identify mineral 
deposits by looking for geological structures 
deep below the surface that might underlie 
lodes of gold or copper. This approach could 
make it easier to prospect for minerals in 
the country’s northern reaches, where harsh 
winters and a shortage of roads make it difficult 
to explore. 

“You could spend forever up there wandering 
around before you discover anything,” says 
Keith Benn, a mineral-exploration consultant 
in Port Lambton, Canada. “This is the promise 
ofthe EON-ROSE approach — when you look at 
this expansive territory in northern Canada, we 
cansay, ‘wecan help you decide where to start” 

Benn is working with mining companies to 
drum up funding for a pilot EON-ROSE study of 
the ancient rocks of central Canada. 

This focus on energy and mineral exploration 
goes beyond the purely scientific aims of 
EarthScope. EON-ROSE organizers hope thata 
broader focus will help them win funding from 
industry. “To move forward, we must have prac- 
tical applications that benefit Canada,” says 
Katherine Boggs, a geologist and project leader 
at Mount Royal University in Calgary. 

Ultimately, the scientists hope to get the 
bulk of their funding from the federal govern- 
ment — although Canada’s general election on 
21 October could markedly shift the outlook for 
science funding. 
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PRECISE CRISPR TOOL 
COULD TACKLE HOST 
OF GENETIC DISEASES 


Greater control could allow many more 
conditions to be treated with gene editing. 


By Heidi Ledford 


or all the ease with which the wildly 

popular CRISPR-Cas9 gene-editing 

tool alters genomes, it’s still somewhat 

clunky and prone to errors and unin- 

tended effects. Now, an alternative 
offers greater control over genome edits — an 
advance that could be particularly important 
for developing gene therapies. 

Thealternative method, called prime editing, 
improves researchers’ chances of getting only 
the edits they want, instead of a mix of changes 
that they can’t predict. The tool, described in 
Nature (A. V. Anzalone et al. Nature http://doi. 
org/dczp; 2019) on 21 October, also reduces 
the ‘off-target’ effects that are a key challenge 
for some uses of the standard CRISPR-Cas9 
system. That could make prime-editing-based 
gene therapies safer. 

The tool also seems capable of making a 
wider variety of edits, which might one day 
allow it to be used to treat the many genetic 
diseases that have so far stymied gene editors. 
David Liu, a chemical biologist at the Broad 
Institute of MIT and Harvard in Cambridge, 
Massachusetts, and lead author of the study, 
estimates that prime editing might help 
researchers tackle nearly 90% of the more 
than 75,000 disease-associated DNA variants 
listed in ClinVar, a database developed by the 
US National Institutes of Health. 

And the specificity of the changes that prime 


Anew gene-editing tool offers more control 
than CRISPR-Cas9 (pictured). 


editing is capable of could make it easier for 
researchers to develop models of disease, or to 
study specific gene functions, says Liu. 

“It’s early days, but the initial results look 
fantastic,” says Brittany Adamson, who stud- 
ies DNA repair and gene editing at Princeton 
University in New Jersey. “You're going to see 
alot of people using it.” 

Prime editing might not be able to make 
the very big DNA insertions or deletions that 
CRISPR-Cas9 is capable of — so it’s unlikely to 
completely replace the well-established editing 
tool, says molecular biologist Erik Sontheimer 
at the University of Massachusetts Medical 
School in Worcester. That’s because for prime 
editing, the change that a researcher wants 


PRECISION EDITOR 


Prime editing reduces the number of unintended changes to a genome by inserting the edits researchers 
want to make into the DNA itself. This contrasts with CRISPR-Cas9Q, which relies on the cell's repair system 


to make the changes. 


Prime editing tool 
Starting DNA nicks one strand of 
sequence X X DNA and inserts edited X Y 
sequence Y into it 
—__________ > 
Cell repairs Prime editor nicks Original DNA 
nicked strand the non-edited sequence is cut off 
Y Y with correct edit X Y DNA strand X Y ee 
<—_—_————_ <—_—_\| 
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to make is encoded ona strand of RNA. The 
longer that strand gets, the more likely it is to 
be damaged by enzymes in the cell. 

“Different flavours of genome-editing 
platforms are still going to be needed for 
different types of edits,” says Sontheimer. 

But prime editing seems to be more precise 
and versatile than other CRISPR alternatives. 
Those include modified versions of CRISPR- 
Cas9 that enable researchers to swap out one 
DNA letter for another, and older tools such 
as zinc-finger nucleases, which are difficult to 
tailor to each desired edit. 


Freedom through control 


CRISPR-Cas9 and prime editing both work by 
cutting DNA ata specific point inthe genome. 
CRISPR-Cas9 breaks both strands of DNA at 
once and then relies on the cell’s own repair 
system to patch the cuts and make the edits. 
But that repair system is unreliable and can 
insert or delete DNA letters at the points 
where the genome was cut. This can lead to 
an uncontrollable mixture of edits that vary 
between cells. 

Even when researchers include a template to 
guide the edits, the DNA repair system in most 
cells is much more likely to make those small, 
random insertions or deletions than to adda 
specific sequence to the genome. That makes 
it difficult for researchers to use CRISPR-Cas9 
to overwrite a piece of DNA witha sequence of 
their choosing. 

Prime editing bypasses these problems (see 
‘Precision editor’). It, too, uses Cas9 to recog- 
nize specific DNA sequences, but the prime 
editor’s Cas9 enzyme is modified to nick only 
one DNA strand. Then, asecond enzyme called 
reverse transcriptase, guided by a strand of 
RNA, makes the edits at the site of the cut. 

The prime-editing enzymes don’t have to 
break both DNA strands at the same time to 
make changes, freeing researchers from rely- 
ing onthe cell’s genome repair system — which 
they can’t control — to make the edits that they 
want. This means that prime editing could ena- 
ble the development of treatments for genetic 
diseases caused by mutations that aren't easily 
addressed by existing gene-editing tools. 

Previously, researchers, including Liu, 
thought that they would need to develop 
gene-editing tools specific to each category 
of change they wanted to make in a genome: 
insertions, deletions or DNA letter substitu- 
tions. And the options were limited when it 
came to making precise substitutions. 

An older technique, called base editing, 
which is comparable in precision to prime 
editing, chemically converts one DNA letter 
directly into another — changing a T to an 
AoraGtoaC — without breaking both DNA 
strands. That’s something CRISPR-Cas9 can’t 
do. Developed by Liu, base editing could be 
useful for correcting genetic diseases caused 
by single-letter mutations, including the most 


common form of sickle-cell anaemia. 

But base editing can’t help with genetic 
disorders caused by multi-letter mutations 
suchas Tay-Sachs disease, a usually fatal illness 
typically caused by the insertion of four DNA 
letters into the HEXA gene. So Liu and his col- 
leagues set out to create a precise gene-editing 
tool that gave researchers the flexibility and 
control to make multiple types of edits without 
having to create bespoke systems. 


“It’s fantastic,” says Sontheimer. “The breadth 
of the mutations that can be introduced is one 
of the biggest advances. That’s huge.” 

Liu’s team, and others, will now need to 
carefully evaluate how well the system works 
ina variety of cells and organisms. “This first 
study is just the beginning — rather than the 
end — of along-standing aspiration in the life 
sciences to be able to make any DNA change at 
any position in an organism,” says Liu. 


RUSSIAN SCIENTIST EDITS 
HUMAN EGGS IN EFFORT 
TO ALTER DEAFNESS GENE 


Denis Rebrikov says he does not plan to implant gene- 
edited embryos until he gets regulatory approval. 


By David Cyranoski 


ussian biologist Denis Rebrikov has 
started editing genes in human eggs 
with the goal of repairing a muta- 
tion that can cause deafness. The 
news, detailed in an e-mail he sent to 
Nature on 17 October, is the latest chapter in 
a saga that kicked off in June, when Rebrikov 
revealed his controversial intention to create 
gene-edited babies resistant to HIV using the 
popular CRISPR tool. So far, only one person 
has claimed to have created a baby froma 
gene-edited embryo — the Chinese scientist 
He Jiankui, in November 2018. 

Rebrikov’s e-mail (see Q&A on page 466) 
follows a September report in the Russian 
magazine N+1, in which he said a couple who 
both have a genetic mutation that impairs 
their hearing had started procedures to col- 
lect eggs that would be used in an attempt 
to create a gene-edited baby. The eggs that 
Rebrikov has edited so far are from women 
without the genetic mutation. He says the goal 
of those experiments is to learn how to allow 
couples with the mutation to havea child with 
unaffected hearing. 

He also wants to better understand poten- 
tially harmful ‘off-target’ mutations, which are 
a known challenge of using the CRISPR-Cas9 
system to edit embryos. 

Rebrikov says he does not plan to use the 
tool to create such a baby yet — and that his 
previously reported plan to apply this month 
for permission toimplant gene-edited embryos 
in women has been pushed back. 

Instead, he says, he will soon publish the 
results of his egg experiments, which also 
involved testing CRISPR’s ability to repair the 
gene linked to deafness, called G/B2, in body 
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cells taken from people with the mutation. 
People with two mutated copies of G/B2 cannot 
hear well without interventions such as hear- 
ing aids or cochlear implants. Rebrikov says 
that these results will lay the groundwork for 
implanting an edited embryo. 

Rebrikov adds that he has permission froma 
local review board to do his research, but that 
this does not allowtransfer of gene-edited eggs 
into the womb and subsequent pregnancy. 

Apart from the couple who agreed to start 
undergoing egg collection, he is in discussion 
with four other couples in which both would-be 
parents have two mutated G/B2 genes, he says. 

Rebrikov also provided further informa- 
tion about the couple who agreed to the 
procedures. In September, N+ reported that 
the couple hadn’t signed a consent form and 
had backed away from the idea of creating a 
gene-edited child, citing personal reasons. 


“I will definitely not 
transfer an edited embryo 
without the permission of 
the regulator.” 


But Rebrikov now says that this is only a 
temporary hurdle. He notes that the woman 
who donated the eggs has taken a one-month 
pause while she gets a cochlear implant. 

Rebrikov also emphasized that he will not 
move forwards without approval from the 
Ministry of Health of the Russian Federation. 
“I will definitely not transfer an edited embryo 
without the permission of the regulator.” 

That might not come any time soon. Earlier 
this month, the ministry released a statement 
saying that production of gene-edited babies is 
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Denis Rebrikov plans to publish his experiments to repair genes in human eggs soon. 


premature. Rebrikov says “it is hard to predict” 
when he will get permission, but that it will be 
after all the necessary safety checks. 

Rebrikov shot to fame in June when he 
told Nature of his plans to make HIV-resist- 
ant babies. The news shocked international 
researchers, who feared that he was following 
in the footsteps of He Jiankui. 

Those plans involve using CRISPR to disrupt 
the same gene that He did — CCRS. The protein 
made by the CCRS gene allows HIV to enter cells, 
and people with a mutated copy of this gene 


“The project is clearly 
unethical and damages the 
credibility of atechnology 
intended tohelp, not harm.” 


are much less likely to get the virus. But many 
scientists say that the benefits — possible resist- 
ance to HIV — are not worth the unknown risks 
of gene editing, because there are other ways 
to prevent HIV passing from parent to child. 

Rebrikov says he started looking for women 
with HIV who wanted to have a baby and who 
have responded poorly to HIV drugs. He 
argues that such people might be good can- 
didates for the procedure because they have 
an elevated risk of passing the virus to their 
children, although many scientists think that 
any attempt to use gene editing inembryos to 
modify CCRSis misguided. In his latest e-mail, 
Rebrikov told Nature that he is still looking 
for suitable women. “But there are very few of 
them,” he says. 

In the meantime, Rebrikov has taken on 
the project to repair the G/B2 gene in human 
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embryos. Some scientists also question the 
benefits of this procedure because hearing 
loss is not a fatal condition. “The project is 
recklessly opportunistic, clearly unethical 
and damages the credibility of a technology 
that is intended to help, not harm,’ says Jen- 
nifer Doudna, a pioneer of the CRISPR tool at 
the University of California, Berkeley. 

In the wake of He’s explosive revelation, the 
World Health Organization (WHO) tasked a 
committee with developing an international 
framework to govern the clinical use of gene 
editing. In August, the WHO committee also 
launched an international registry of clini- 
cal research using gene editing in humans to 
oversee this practice. An international com- 
mission created by the US National Academy 
of Sciences, the US National Academy of Med- 
icine and the United Kingdom’s Royal Society 
is also preparing a framework to guide clini- 
cal research in germline gene editing. This 
is expected to be released by mid-2020. The 
commission will hold a public meeting on 
14-15 November to gather ideas. 

Rebrikov said last month that he wants to 
follow regulations that have been internation- 
ally agreed on when moving gene editing to 
the clinic, according to the Bloomberg news 
agency. But he also expressed frustration that 
none exists yet. 

Robin Lovell-Badge, a developmental biolo- 
gist at the Francis Crick Institute in London and 
a member of the WHO committee, says that 
Rebrikov should wait until such a framework 
has been agreed, which will take time. “This 
is not asimple matter, and it is ridiculous to 
think that we cancome up with global solutions 
to regulation in a very complex scientific and 
potentially clinical area in a few months.” 
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Denis Rebrikov 


Below are edited versions of the questions 
that Nature sent to Rebrikov, and his 
answers. 


Some scientists and bioethicists say that, 
because deafness is not a life-threatening 
condition, it should not be the target of a 
risky treatment like this. 

Any new drug carries certain risks. The 
deafness model is the most appropriate 
for applying genomic editing at the zygote 
[newly fertilized egg] stage. 


In particular, scientists worry about 
off-target mutations — which are 
potentially dangerous and could be 
introduced away from the intended edit. 
Of course we worried about those. We 
have a long and reasonable algorithm 

for checking off-target activity. I’d like to 
discuss the algorithm for checking the 
efficiency and safety of the technology, 
rather than the method’s prematureness. 


Some also warn that because the CRISPR 
repair mechanism is inefficient, there is a 
high likelihood of producing children with 
mosaicism — a mix of edited and unedited 
cells. Are you worried about this? 

Yes. Unfortunately, due to the impossibility 
of a complete analysis of the embryo — 

we only look at a biopsy of five to seven 
cells — we will never be completely sure 

of the absence of mosaicism in transferred 
embryos. But statistically (in experiments), 
it is possible to show either the permissible 
percentage of mosaicism or its absence. 


The Russian health ministry said earlier 
this month that it follows the position 

of the World Health Organization 
committee: it is too early to do such 
experiments. Will you apply anyway? 
What does it mean, too soon? Lenin said, 
“yesterday was too early, tomorrow it will 
be too late.” 


Those working on international 
frameworks to guide the clinical 
application of human-embryo editing 
have suggested that, until they are done, 
clinical research should slow down. 

Are you serious? Where did you see the 
researcher willing to slow down? 


ANDREY RUDAKOV/BLOOMBERG/GETTY 
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This rat-sized Liaoconodon hui is one of many fossils from northern China that are sharpening the picture of how mammal traits evolved. 
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THE MAKING 
OF MAMMALS 


Anexplosion of fossil finds is allowing our early mammal ancestors 
to leap out of the shadow of the dinosaurs. By John Pickrell 


ight is falling in the early Jurassic 
185 million years ago, and the Kay- 
entatherium is tending to her newly 
hatched brood. Heavy rains pum- 
mel the bank above her den as she 
looks over her dozens of tiny young. 
She is about the size of a large cat 
and could easily pass for amammal, 
but her large jawbone, characteristicteeth and 
lack of external ears give her away: she is acyno- 
dont, amember of the group from which mam- 
mals evolved. At some point without warning, 
the sodden bank collapses, entombing the 
hatchlings and their mother in mud. 

There they remained until the summer of 
2000, when a fossil-hunting crew led by Tim- 
othy Rowe at the University of Texas at Austin 
chanced upon their scattered bones among 
rocks of the Kayenta Formation in northern 
Arizona. 

That initial encounter with the fossils did 
little to impress the palaeontologists. They 
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dug up the block and shipped it back to the 
laboratory for safekeeping. It wasn’t until 
nine years later that a specialist preparing the 
fossil for study noticed something startling: 
embedded in the block were tiny teeth, and 
jawbones just 1 centimetre in length. “Immedi- 
ately they stopped the preparation and thought 
about ways of non-destructively examining 
the babies,” says Eva Hoffman, at Texas with 
Rowe at the time and now a palaeontologist 
at the American Museum of Natural History 
in New York City. Instead of breaking into the 
rock, Hoffman and Rowe digitally extracted 
the bones witha microcomputed tomography 
(microCT) scanner, which uses X-rays to create 
fine-grained 3D images. 

What they found inside the rock were the first 
known babies of mammals or their relatives 
from the Jurassic — and not just one, but 38 of 
them, placing this among the most significant 
discoveries related to mammal origins made 
in the past decade". Kayentatherium is at the 
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cusp of mammalhood — and researchers say 
that it provides crucial insights into which traits 
define mammals and which were present in 
their earlier relatives. 

Kayentatherium’s skeleton is mammal-like 
in many ways, but the fossil suggested that it 
still reproduced very muchlikea reptile, giving 
birth to large litters of small-brained offspring. 
By contrast, “mammal moms invest a lot ina 
smaller number of babies, each of which has 
a better chance of surviving”, says Hoffman. 
Mammal babies spend longer under their 
parents’ care, developing relatively large 
brains, whereas these fossil hatchlings had 
well-developed bones and teeth, hinting that 
they could fend for themselves and were not 
nourished by milk, as all mammals are today. 

The find is among a mass of discoveries 
in the past 10-20 years that are illuminating 
milestones in mammalian evolution. Although 
major finds are emerging all over the world, 
the largest number are coming out of China; 
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together, they have overturned the now dated 
belief that dinosaur-era mammals were small, 
unremarkable insectivores, eking out a life in 
the shadows of the giant reptiles. 

The fossils have revealed that early mammals 
were ecologically diverse and experimenting 
in gliding, swimming, burrowing and climbing. 
The discoveries are also starting to reveal the 
evolutionary origins of many of the key traits 
of mammals — suchas lactation, large brains 
and superbly keen senses. 

“The explosion of early-mammal discover- 
ies, particularly from China, over the last two 
decades has been eye-opening, mind-numbing 
and absolutely dazzling,” says David Krause, 
a vertebrate palaeontologist at the Denver 
Museum of Nature and Science in Colorado. 

This avalanche of discovery is also stirring up 
debate: some researchers disagree over which 
fossil groups are true mammals and the shape 
of the mammal family tree. “We want to under- 
stand our early history in the language of evolu- 
tionary biology, and that’s what fires me up,’ says 
Zhe-Xi Luo, a palaeontologist at the University 
of Chicago in Illinois. “That’s why this entire 
field is so interesting, because the fossil record 
is getting better and better, and we are starting 
to really tackle some of these questions.” 


Out of the shadows 


In 1824, at the Geological Society of London, 
naturalist William Buckland presented bones 
from one of the first known dinosaurs, Megalo- 
saurus. Atthe same talk, he revealed tiny mam- 
malian jaws that had been found in the same 
fossil deposit. Their presence suggested that 
mammals had a very deep history, but as would 
happen repeatedly, the dinosaur discoveries 
completely overshadowed the mammal ones. 

The slow trickle of mammal finds from 
around the world continued for 150 years. Then 
in1997, researchers described the first ancient 
mammal from the fossil-rich rocks of Liaoning 
in northeastern China’, and the floodgates 
opened. Since then, 50 or more near-complete 
and “beautiful specimens” have been found 
there, according toJin Meng, a palaeontologist 
at the American Museum of Natural History. 
Like the dinosaur fossils, they are dug up by 
local farmers and sold on to museums. 

But the dinosaurs continued to get the vast 
majority of the attention, says palaeontologist 
Steve Brusatte at the University of Edinburgh, 
UK. “It’s only that very recently, through 
the work of Luo, Meng and others, that the 
mammals are getting their due.” 

Most of China’s mammal fossils were formed 
when volcanoes buried the animals in ash — and 
they are exquisitely detailed. Typical mammal 
fossils from the Mesozoic era (252 million to 
66 million years ago) are little more than teeth 
and jaw fragments, but Chinese specimens 
often have entire skeletons, with fur, skin and 
internal organs. “We have a lot of detail to 
answer scientific questions,’ says Meng, He is 


interested in understanding the evolution of 
the mammalian ear, for instance. 

The finds overturned previous dogma. “We 
used to say that during the time of dinosaurs, 
mammals were totally unspectacular. That they 
were just these little mousey things scamper- 
ing around in the shadows,’ says Brusatte. But 
these animals “were undergoing their own 
evolutionary explosion”, he says. 

Mammals first appeared at least 178 million 
years ago, and scampered amid the dinosaurs 
until the majority of those beasts, with the 
exception of the birds, were wiped out 66 mil- 
lion years ago. But mammals didn’t have to wait 
for that extinction to diversify into many forms 
and species. “These new discoveries document 
a huge, hitherto-undreamed-of ecological 
diversity,” says Richard Cifelli, a palaeontolo- 
gist at the University of Oklahoma in Norman. 

Among the first innovations that researchers 
began to find in fossil form were those to 
do with locomotion. In 2006, Meng’s team 
reported the first gliding mammal’, 164-mil- 
lion-year-old Volaticotherium, which had wing 
membranes made of furry skin, like today’s 
flying squirrels. In 2017, Luo’s team added 
Vilevolodon and Maiopatagaium*, which 


“THESE NEW 
DISCOVERIES 
DOCUMENT A 
HUGE, HITHERTO- 
UNDREAMED-OF 
ECOLOGICAL DIVERSITY. 
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lived at around the same time and belonged 
toa group called the haramiyids. These animals 
swooped between the trees alongside some 
of the first flying dinosaurs, taking advantage 
of previously unexploitable food resources. 

Researchers found other specializations 
that they assumed had evolved only later: 
Agilodocodon could climb trees and gnawed 
into bark to feast on sap®; the platypus-sized 
river-dweller Castorocauda had webbed feet 
and abeaver-like tail for swimming’; and Doco- 

fossor had paws and claws for digging, and 
looked like a modern mole’. 

These mammals had also adapted to a 
multitude of diets, much more diverse than 
previously assumed. In 2014, Krause described 
the groundhog-like Vintana from Madagas- 
car’, a herbivore that perhaps fed on roots 
and seeds. And the wolverine-sized carnivore 
Repenomamus, which Meng’s team reported 
in 2005, had baby dinosaur bones in its stom- 
ach!°. Many of these new-found fossil mammals 
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belong to long-extinct subgroups, says Meng. 
In contrast to the panoply that existed in the 
Mesozoic, mammals today come in just three 
varieties: placentals, which make up the major- 
ity of species and include humans; marsupials, 
such as kangaroos and koalas, in which gesta- 
tion in the womb is brief and development 
continues in a pouch; and the egg-laying mono- 
tremes, represented only by the platypus and 
several echidnas. “But in geological history, 
there were many other groups suchas multitu- 
berculates, triconodonts and haramiyids,” says 
Meng. “Mammals were actually very diverse in 
the Jurassic.” 

Some, such as the shrew-like Juramaia — 
described by Luo’s team in 2011 and dated to 
160 million years ago — are among the earliest 
placental mammals and therefore have the 
potential to be our ancestors”. 

And a few dinosaur-era mammals were 
much bigger than suspected, too. Repe- 
nomamus was 12-14 kilograms, and the 
racoon-sized Vintana weighed in at 9 kg. “It’s 
exciting that we kind of busted the old myths 
that early mammals came froma very humble 
generalized ancestor,” says Luo. 

The finds are not solely from China. Important 
fossils are also coming from the United States, 
Spain, Brazil, Argentina, Madagascar and Mon- 
golia. Some of the most intriguing and oldest 
fossils — as well as the biggest gaps in our knowl- 
edge — relate to the southern continents, where 
only five genera of Mesozoic mammals and their 
relatives are known, compared with more than 
70 genera from northern latitudes. In the past 
two decades, Brazil has yielded several Trias- 
sic fossils that are more than 200 million years 
old. Guillermo Rougier, a palaeontologist at the 
University of Louisville in Kentucky, describes 
them as “incredible discoveries” that are right 
onthe cusp between mammals and their cyno- 
dontancestors. “These forms really showavery 
transitional progression from things that are 
typically non-mammalian, to things that pretty 
much have all the features of early mammals.” 


Mammal must-haves 


The latest finds are also offering clues to 
the evolution of key mammal features. For 
instance, the keen hearing of mammals is 
partly down to tiny bones in the middle ear — 
the malleus, incus and ectotympanic. Butinthe 
reptilian ancestors of mammals, these bones 
were part of the jaw, and were used for chewing 
instead of hearing. Mammal forerunners, such 
as shrew-like Morganucodon from 205 million 
years ago, sported a prototype of the mammal 
arrangement that allowed for both functions”. 

In 2011, Meng reported an intermediary”: 
a120-million-year-old specimen from China 
belonging to a group of mammals called 
eutriconodonts and named Liaoconodon hui 
(see ‘Mammal hallmarks’). The rat-sized fossil 
revealed three middle-ear bones, but they were 
still attached to the jaw CONTINUED ONP.472 > 
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ago. Discoveries in the past two decades show 

that early mammals were a diverse bunch, Ai 
with sophisticated skills such as gliding and 
burrowing that researchers thought evolved 

only later. Many of the features that define 

mammals - like suckling milk, exceptional 

hearing and small litter sizes — had already 

appeared by the time true mammals were 

roaming the land, rivers and skies. 
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Many archetypal mammalian features 
evolved ina short burst early in mammal 
evolution, including innovations in 
movement, development and diet. 


SKY GLIDERS 


Several modern mammals glide 

on wings of stretched skin, but 

the exquisitely preserved furry 
membranes of Jurassic-era 
Maiopatagium furculiferum revealed 
that this ability evolved early, by 

160 million years ago. Squirrel-sized 
Maiopatagium probably feasted on 
fruit, but other gliders from the same 
period had teeth more suited to seeds. 


Fur-covered skin membrane 
stretches between front and 
hind limbs 
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This exquisitely preserved 160-million-year-old specimen of Maiopatagium furculiferum 
shows how early gliding evolved. 


> by cartilage. “The hearing function and the 
chewing function were still not completely sepa- 
rated,” he explains. This was hard evidence of the 
evolutionary transition from jaw to ear. 
Another unique trait of mammals is the 
sophisticated way they chew and ingest food 
insmall parcels, rather than swallowing things 
whole as snakes andalligators do. To make that 
possible, mammals evolved a wide variety of 
complex teeth for biting and grinding food. 
But as babies, mammals are nourished 
another way — by suckling from their mother’s 
mammary glands. “Our whole group is named 
after this incredible biological innovation,’ says 
Luo. Drinking milk is made possible by the abil- 
ity to suck and swallow, aided by the hyoid 
bones in the throat and muscles that support 
them. This apparatus also forms the voice box. 
In July, Luo published a paper revealing a 
165-million-year-old vole-sized docodont — a 
close relative of true mammals — that had the 
hyoid bones of its throat preserved”. Micro- 
docodongracilis is the earliest animal knownto 
have been able to suckle like a modern mammal. 
This level of detail is rare, and — similar to the 
study of the Kayentatherium hatchlings — the 
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work on both the ear and throat bones has 
been made possible only through advances in 
microCT scanning techniques, says Krause. The 
technique has also revealed details about the 
olfactory abilities and brains of early mammals. 
These revelations are “breathing life into these 
early mammals in ways that were previously 
impossible and almost inconceivable’, he says. 

Much of the constellation of features we 
think of as defining mammals — complex 
teeth, excellent senses, lactation, small litter 
size — might actually have evolved before 
true mammals, and quite quickly. “More and 
more it looks like it all came out ina very short 
burst of evolutionary experimentation,” Luo 
says. By the time mammal-like creatures were 
roaming aroundin the Mesozoic, he says, “the 
lineage has already acquired its modern look 
and modern biological adaptations”. 


Family drama 

Although the experts concur on many points, 
thereis still much debate about howearly mam- 
mal groups are related, and which groups are 
true mammals. That leads to uncertainty about 
how key traits evolved, says Hoffman. 
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One sticking point between Meng and Luo, 
who have each developed their own evolu- 
tionary trees, is the haramiyids. Meng thinks 
this early group belongs with true mammals, 
whereas Luo is convinced it’s a side branch. 
The oldest known haramiyids are from 208 mil- 
lion years ago in the Triassic. If they are true 
mammals, then mammal origins date back at 
least that far — if not, then the oldest known 
mammal is 178 million years old, well into the 
Jurassic. 

More fossils will help to resolve such 
questions, and bring more surprises. Krause 
and Meng say they are both studying exciting 
fossils, but are yet to publish their findings on 
them, and tens of unstudied specimens lie piled 
in the offices of their Chinese colleagues. 

Palaeontologists have many items on their 
wish lists. One characteristic that Luo wants 
to understand is growth rates. Reptiles grow 
slowly throughout their lives, whereas mam- 
mals growin bursts in youth and then plateau. 
He’d love to find a series of fossils from babies 
to adults to watch this happening. “That is one 
of the most critical features in mammals that 
help to define our biology,’ he says. 

Both Hoffman and Meng agree that embryos 
and more babies would be significant finds — 
and, like the Kayentatherium discovery with 
its dozens of hatchlings, they would help us 
to pinpoint the date that mammal-style small 
litter sizes appeared. Meng’s dreamisto finda 
pregnant mammal. “This is always in my mind 
that I will find a mammal that inside the skele- 
ton you can see some very delicate skeleton, 
which is either an egg that hasn’t hatched, or 
it’s amore interesting fetus,” he says. 

If the flurry of discoveries has taught 
researchers anything, it’s that every fossil find 
has the potential to add a chapter to evolution- 
ary history or even flip the prevailing narrative 
onits head. “We're really in this exciting, almost 
manic phase of lots of new evidence coming 
in, and it’s going to take time to synthesize,” 
says Brusatte. 


John Pickrell is a science journalist and author 
in Sydney, Australia. 
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Atechnician monitors cryptocurrency-mining rigs at a Bitfarms facility in Saint-Hyacinthe, Canada. 


The storied state of economics 


Robert Shiller’s study probes how social behaviour supersedes 
Statistics in determining the fate of economies. By Tim Jackson 


conomists are tellers of stories 
and makers of poems,” wrote 
the economic historian Deidre 
McCloskey in 1990. It’s a curious 
observation for a profession that 
prides itself on hard-nosed, quantitative analy- 
sis and strives continually for predictive power. 
The Nobel-prizewinning economist Robert 
Shiller goes even further. 

Stories are more powerful than statistics, 
he claims. The irrationality inherent in finan- 
cial exuberance (and despair) defies the neat 
territory of numbers and demands a deeper 
excursion into the decidedly unruly world of 
narratives. That is the declared aim of his book 
Narrative Economics. 

It’s acompelling hypothesis. Since the 1960s, 
we have known that science is socially con- 
structed. Since the 1980s, sociologists have 
sought to understand the ‘social amplification 


aa 


of risk’ — in which people are drawn inexorably 
towards stories of disaster or triumph (rather 
than statistics or probabilities) as the lodestone 
for the perceptions of risk that guide their 
everyday decisions. Around the same time, 
philanthropist George Soros adapted the con- 
cept of reflexivity to explain howinvestors’ per- 
ceptions affect the social environment, which, 
inturn, informs their perceptions. 

This feedback loop allows speculative 


Narrative Economics: 
How Stories Go Viral 
and Drive Major 
Economic Events 
Robert J. Shiller 
Princeton University 
Press (2019) 


Robert J. Shiller 
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bubbles to arise with alarming speed, and then 
collapse again. The phenomenon reached its 
apotheosis in a now infamous remark from 
Citibank chief executive Chuck Prince that 
“when the music stops, in terms of liquidity, 
things will be complicated. But as long as the 
music is playing, you’ve got to get up and 
dance.” His prophetic words came just months 
before the 2007-08 financial crisis struck. 
Shiller elevates these insights into a full- 
blown exploration of the multiple ways in 
which narratives influence economic behav- 
iour. Much as he tracked the rise and fall of 
asset prices in his Nobel-prizewinning work, 
he nowcharts the flux of narrative memes using 
Google’s Ngram Viewer — which allows users 
to track the frequency of words and phrases 
in text over time — and Proquest’s database of 
news citations. It’s a quaint device, and there’s 
adeceptive similarity between the time-series 
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graphics in Narrative Economics and those in 
his bestselling book /rrational Exuberance 
(2000). But the message is effective: the value 
of your story might go up as well as down. 

The empirical core of the book is a detailed 
exploration of numerous real-life case stud- 
ies, ranging from bimetallism (an old-fash- 
ioned form of money) to bitcoin (a brand-new 
one), and from the Great Depression of the 
1930s to the Great Recession of recent years. 
Along the way, his anecdotes form a fascinat- 
ing subscript. A convincing case is made, for 
instance, that fears ofa ‘singularity’ — a point of 
noreturnarising from technological advances 
— are perennial. He notes numerous viral out- 
bursts of this meme (associated with cotton 
mills, electricity and computers, for instance) 
dating back to the nineteenth century. Today’s 
apocalyptic anxieties about a robot takeover 
are nothing new and should not be heeded, 
Shiller seems to imply. How that will turn out 
remains to be seen. 

We learn that the mechanism through which 
a memorable turn of phrase goes viral can be 
described as a form of contagion, mirroring 
models from epidemiology. But we are also 
persuaded that viral success depends inher- 
ently on the messenger. Few remember that the 
phrase “the only thing we have to fear is fear 
itself”, immortalized by US president Franklin D. 
Roosevelt during the Great Depression, was first 
uttered years earlier by economist Irving Fisher. 
It’s troubling, of course, to be reminded that the 
rewards for creativity are often misallocated 
— particularly in today’s plagiaristic world of 
social media, with its immense powers of nar- 
rative acceleration. But for me, this particular 
example raised a deeper concern. 

Fear is arational response from people whose 
livelihoods are under existential threat. So 
why would a president inveigh against it? The 
answer is that Roosevelt was painfully aware 
of the implications of fear. He was addressing 
what the economist John Maynard Keynes 
(borrowing from another long-forgotten crea- 
tive) called the “paradox of thrift”: the tendency 
of ordinary people to curtail their consumption 
in the face of economic uncertainty, and put 
their money into savings instead. 

Such behaviour is sensible, admirable even, 
at the individual level. Perhaps it is so at the 
planetary level, too: lower consumption might 
benefit the environment. But economics has a 
problemwith it. As people spend less, demandis 
suppressed, prolonging the recession. The same 
thing happened after the 2007-08 crisis. The 
paradox of thrift was the foundation for Keynes's 
most famous proposal: that governments pro- 
vide stimulus that could return the economy 
to growth when people would not. This was the 
rationale for Roosevelt’s New Deal package of 
reforms, and the inspiration for the proposed 
US legislation called the Green New Deal. 

Keynes was a pragmatist; his prescriptions 
were aresponse to the diseases of the day. But 
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he was also a visionary. In his essay ‘Economic 
Possibilities for Our Grandchildren’ (1930), 
he foresaw a time when our society would 
move beyond growth. It hasn’t happened yet 
— in spite of economist Kenneth Boulding’s 
remark to the US Congress in 1973 that “any- 
one who believes exponential growth can go 
on forever ina finite world is either a madman 
or an economist”. 

Shiller is clearly not a madman. But in the 
course of an otherwise fascinating exploration 


“It’s troubling, of course, 
tobe reminded that the 
rewards for creativity are 
often mislocated.” 


of the power of story, he never once acknowl- 
edges that eternal growth is itself just a nar- 
rative. He notes that the logic of relentless 
expansion conflicts with the logic of human 
anxiety. But he assumes that it is people who 
are at fault. Narratives can have clear, moral 
and prudential foundations, it seems, but they 
might still be cast as irrational. 

Indeed, for Shiller, that memorable speech 
on the “fear of fear” shows government 


attempting to “lean against false or misleading 
narratives and establish a moral authority 
against them”. Roosevelt’s remark was designed 
to “create and disseminate counternarratives 
that establish more rational and more pub- 
lic-spirited economic behavior”. What Shiller 
seems to be saying is this: when ordinary 
human sentiment runs counter to the prevail- 
ing logic of capitalism, the state must override 
it. Itis a deeply suspect, potentially dangerous 
conclusion. Butit, too, demonstrates just how 
pervasive narrative is. 

Ultimately, Narrative Economics is an 
eloquent and accessible exposition of a 
seductive idea. It’s a particularly compelling 
hypothesis for Britain, a country still reeling 
froma public referendum whose outcome was 
determined by viral confabulations of the most 
pernicious kind. We are all “tellers of stories 
and makers of poems”. But neither economists 
nor politicians can claim moral authority over 
narrative truth. We must all choose carefully 
which stories we live by. 


Tim Jackson is director of the Centre for the 
Understanding of Sustainable Prosperity at 
the University of Surrey in Guildford, UK, and 
author of Prosperity without Growth. 

e-mail: t.jackson@surrey.ac.uk 


Testosterone chronicles: 
truths and tall tales 


A book onthe hormone dissects fact from fake and 
questions interpretations. By Randi Hutter Epstein 


n1June 1889, renowned neurologist 
Charles-Edouard Brown-Séquard 
shocked his colleagues. Speaking 
at the Paris Society of Biology, the 
72-year-old announced that a slurry 
made from the ground testicles of guinea pigs 
and dogs (injected under his skin ten times in 
three weeks) made him stronger. He also noted 
that his “jet of urine” lengthened by 25%. 
Brown-Séquard was ridiculed by his peers 
throughout Europe for disseminating results 
with no scientific basis and promoting quack 
youth-enhancing ‘cures’. Yet the bizarre elixir 
found favour with members of the public 
in the United States, United Kingdom and 
Europe — at least among men eager to recap- 
ture youthful sexual prowess. As the engaging 
book Testosterone explains, Brown-Séquard’s 
testimonial helped to shape future studies that 
linked the hormone to alleged ‘manliness’. 
Anthropologist Katrina Karkazis and 
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sociomedical scientist Rebecca Jordan-Young 
did not write Testosterone to rehash familiar 
tales of wacky hormone experiments of yore, 
although this is one of a few that they include. 
Their contention is that many testosterone 
researchers — then and now, and intention- 
ally or not — interpret data with blinkers on. 
When the facts do not fit the paradigm, the 
authors argue, findings are moulded into 
flawed dogma. Karkazis and Jordan-Young 
strive to comprehend how scientific practice 


Testosterone: 

An Unauthorized 
Biography 
Rebecca M. Jordan- 
Young, Katrina 
Karkazis. 

Harvard University 
Press (2019) 
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Polarized-light micrograph of crystals of testosterone. 


around testosterone unfolds, and explore how 
the results “circulate and morphin the world”. 

Today, the biochemistry of this steroid hor- 
moneis well known, from its daily fluctuations 
toits synthesis from cholesterol and occasional 
conversion to oestradiol, a form of oestrogen. 
Testosterone is known to restore sex drive 
and muscle tone among men with ailments 
that reduce levels of the hormone, such as 
pituitary tumours. During puberty, a surge of 
testosterone in young men typically leads to 
enlargement of the muscles, penis, testes and 
prostate gland, and the emergence of second- 
ary sex characteristics. Inwomen, testosterone 
excreted by the adrenal glands and ovaries is 
generally important for ovarian function and 
bone strength. 

Like pathologists doing a post mortem, 
Karkazis and Jordan-Young dissect the remains 
of aselection of studies. They parse statistics 
and the cultural context that prompted the 
research and influenced how the data were 
analysed. (Full disclosure: | have served ona 
history of medicine panel with Karkazis and, as 
medical authors writing about endocrinology, 
our paths have crossed several times.) 

The authors delve first into testoster- 
one’s role in ovulation. The hormone and its 


precursor, DHEA, have aroleinthe maturation 
of ovarian cells; DHEA might boost fertility 
directly or asa mediator of oestrogen produc- 
tion. There are chapters focusing on traits often 
assumed to be associated with testosterone, 
suchas athleticism. The authors also scrutinize 
the brouhaha surrounding a small psychology 
study’ claiming that holding particular poses 
boosts testosterone production. There is a 
section on parenting, thanks to studies that 
created a fleeting media buzz by claiming that 
new fathers’ testosterone plummets when 
they change nappies and do other nurturing 
chores”’, And the authors discuss athletes who 
take testosterone to boost their abilities. 

They do not dispute that injections, gels 
or patches that send testosterone levels sky- 
rocketing above the norm build muscles when 
coupled with intense training. But they are 
sceptical about whether the hormone makes 
a large difference for every athlete. Some 
studies, they write, have found a correlation 
between high natural testosterone levels and 
speed and power; others show tenuous or no 
links. Anda few studies link higher testosterone 
levels to worse performance. 

Jordan-Young and Karkazis challenge murky 
definitions. They show how researchers define 
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risk-taking through “weirdly narrow and also 
wildly divergent” behaviours, such as riding a 
motorcycle without a helmet. They citeateam 
that surveyed business students about their 
entrepreneurial experience and used a saliva 
sample to gauge their testosterone levels*. On 
the basis of these dubious data, the investiga- 
tors concluded that those who had the highest 
levels, coupled with family business experi- 
ence, were the most entrepreneurial. 

When it comes to testosterone and aggres- 
sion, the authors say that some of the most rigor- 
ous studies (double-blind, placebo-controlled) 
show no connection. What’s more, they write 
that even the investigators of studies that tie 
testosterone to violence acknowledge that the 
link is inconsistent and weak. Yet the idea that 
testosterone drives violence remains widely 
accepted and “grossly overblown”. 

By setting the record straight, the authors 
build on their past record. Jordan-Young 
explored the evidence for putative neuro- 
logical sex differences in the 2011 book Brain 
Storm; Karkazis demolished preconceptions 
about people who are intersex in her 2008 
work Fixing Sex, which also explores the often 
disturbing history of ‘treatments’ for ‘ambig- 
uous’ genitalia. 
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Although often academic in tone, the book 
is leavened by a welcome informality. The 
authors describe the link between testoster- 
one and violence as a zombie: “a fact that seem- 
ingly can’t be killed with new research”. They 
personify testosterone as “T” and characterize 
their bookas an “unauthorized biography”. An 
authorized biography, they note, “sweeps away 
all kinds of details and smooths over contradic- 
tions”. Theirs intends to pull back a veil that has 
obscured the field. 

Still, sometimes wanted more. Ina chapter 
on ovulation, they quote a woman receiving 
fertility treatment who thinks that a therapy 
containing DHEA helped her to produce more 
eggs of higher quality. The authors note that 
the idea of testosterone aiding a woman’s 
fertility has been anathema to reproductive 
endocrinologists, but quote only one clinician. 
That left me wondering whether other clini- 
cians were still reluctant, or if this were part 
of standard treatment. I wanted to hear from 
other fertility clinicians. 

Inthe opening of the chapter on athleticism, 
the authors refer to a 2012 meeting with an 
endocrinologist who explains that testoster- 
one rises sharply in response to intense exer- 
cise, but that responses vary among athletes. 
Then, they describe an interview with asecond 
expert who tells them the opposite, and also 
says that some types of sports training might 
lower testosterone. I wanted to know who these 
experts were. 

Moreover, although Jordan-Young and Kar- 
kazis are lively storytellers, every now and then 
an anecdote doesn’t jibe with the chapter’s 
content. For example, they start the discus- 
sion on risk-taking with a delightful account 
of 63-year-old Annie Edson Taylor, whoin1901 
went over Niagara Falls on the US-Canadian 
border ina pickle barrel. That seems a literary 
stretch: we know nothing of Taylor’s hormonal 
state (except that because she was probably 
postmenopausal, her testosterone would have 
been low, and her oestrogen and progesterone 
certainly lower than before). 

These quibbles, however, are minor ina 
deeply researched and thoughtful book that 
adds a fresh perspective to a growing body of 
work aiming to debunk myths about hormones. 


Randi Hutter Epstein is writer in residence 
at Yale School of Medicine in New Haven, 
Connecticut, and author of Aroused: The 
History of Hormones and How They Control 
Just About Everything and Get Me Out: A 
History of Childbirth from the Garden of Eden 
to the Sperm Bank. 

e-mail: rh152@columbia.edu 
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Science by design: 
Nature renewed 


From custom typeface to digital-friendly logo, follow 
the journey to the journal’s new look. By Kelly Krause 


hould science be ugly? This is a serious 

question asked by serious people at 

seminars. Some assume that an aes- 

thetically appealing presentation sig- 

nals at best a lack of priorities, and at 
worst a lack of rigour. 

I disagree. Science sorely needs best prac- 
tices in visual communication as well as in 
information design, a mature field with quan- 
titative methods. In my view, the idea that 
scholarly publishing should be divorced from 
evidence-based applications of good visual 
design is perplexing. 


“The ‘flavour’ of the typeface 
— the feelings it evokes, its 
personality — evolved over 
several months.” 


Looking back over the past 150 years of 
Nature, we see an aesthetic that bends with 
time and trends, from ornate Victorian embel- 
lishments in 1869 to stark minimalism in the 
late 1960s. But design is not solely about how 
something looks; it is also concerned with how 
it works, and that understanding has never 
been more urgentthan in the digital age. Design 
as a discipline exists to solve problems, and 
working researchers, readers and contributors 
have many. As publishers, we’ve asked how we 
might assist working scientists. We have heard 
your pleas, many stemming from information 
overload and the need to pack ever more data 
onto small screens. 

So we are refreshing Nature’s look, and not 
justin honour of our 150th anniversary. We are 
in the early stages of our evolution towards 
designing for readers’ digital reality. Here isa 
tour of what is different, and why and how we 
have changed it. 


Typography 

Acustom typeface, Harding, has been created 
for Nature's new logo and much else: you’re 
reading it right now. Harding is named after 
the late neurologist Anita Harding. Brilliant and 
generous, she published in Nature before she 
died in1995 at age 42. According to colleagues, 
she was known for taking questions from the 
clinic back into the laboratory, and for her wry 
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sense of humour. When she learnt that she had 
a terminal illness, she apparently joked that 
at least she wouldn't have to buy Windows 95. 

Our team designed the typeface specifi- 
cally for science, in partnership with Com- 
mercial Type founders Christian Schwartz 
in New York and Paul Barnes in London, and 
with London designer Mark Porter, whom 
Nature engaged for the overall redesign. Care 
was taken to identify the needs of technical 
material, because scholarly articles use clas- 
sic type styles in unique ways. For instance, 
papers often have mathematical equations 
and formulae in the sub- and superscript 
lines, along with Greek letters and special 
characters. So we have made the sub- and 
superscript characters larger than stand- 
ard, and created a Greek alphabet carefully 
honed to convey scientific meaning rather 
than typical Greek-language prose — for 
example, clearly rendering an alpha (a) ina 
shape that looks like a mathematical symbol, 
so that it is not easily confused with a Latin 
italic letter a. We have also made the italics 
more slanted so they are more distinct; sin- 
gle italic characters, such as hf for Planck’s 
constant, are often used as isolated symbols 
with scientific meaning. 

Harding is designed to cope across the 
disciplines. It boasts an unusually large range 
of special characters, from triple prime and 
nablatoa full set of astronomical symbols and 
the ‘click’ phonemes found in some African 
languages. 

A key consideration in Harding’s over- 
all design is performance on small digital 
screens. To boost readability in a limited 
space, it helps to enlarge the main portion 
of the lower-case letters, while making the 
ascenders and descenders (as in ‘h’ and ‘g’, 
respectively) smaller. Ultimately, this ren- 
ders long, complex strings of words easier 
to parse, and allows for neat stacking of 
lengthy technical research-article titles over 
a number of lines. 

The ‘flavour’ of the typeface — the feelings 
it evokes, its personality — evolved over sev- 
eral months. We initially looked at six fledg- 
ling concepts, each with distinct letterforms 
such as rounded serifs (the small strokes at 
the end of letters). After we winnowed these 
down to two, Harding emerged as the clear 
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The old logo 
A selection of logos from Nature’s past. The lower-case ‘n’ was introduced in 1974. 
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A matter of contrast 

In typography, contrast refers to weight contrast, which is the difference 
between thick and thin strokes, shown lower right. Nature’s outgoing logo is 
very high contrast, which is not suitable for digital, particularly mobile, devices. 
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Designing a typeface 

Harding's unusually high x-height is engineered for optimum legibility at small 
sizes, particularly on mobile screens, but also in print. The short ascenders and 
descenders help to stack long strings of words into small spaces. 


Ascender 


Harding ~ 


Descender 


Evolution 

Nature's updated logo is based on our new typeface, Harding. In this 
illustration, the standard Harding letters are light blue, with the new logo 
type in dotted black outline. 


Final logo 
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For the logo, we added 
a ball terminal to the ‘r’ 
and a tail to the ‘a’, ina 
nod to logos of the 
recent past. 


© 2019 Springer Nature Limited. All rights reserved. 


winner. We aimed for an overall impression 
of calm, rational intelligence with perhaps a 
dash of British formality and wit. 

The myriad design considerations behind 
Nature’s new typeface serve one goal: to 
improve the reading experience for research- 
ers and policymakers globally, and enhance 
comprehension and insight. 


Logo 

Nature has had at least ten logos since 1869, 
reflecting the styles of successive eras (see 
‘The evolution of Nature’). All, up to now, were 
designed for print. The relatively large, luxu- 
rious physical space of a printed cover allows 
for fine detail in a way mobile devices do not. 
Finely worked features, digitized into pixels, 
can look fuzzy on smartphone screens. 

Using the Harding typeface as a basis, we 
have updated the logo. Weight contrast — the 
variation of thin and thick lines in a letter- 
form — was an important factor because 
high contrast aids pixellation on the small 
screen. For the logo, we modified Harding 
slightly, adding a tail to the letter ‘a’ anda 
rounder terminal on the ‘r’, to align it with 
recent versions of the logo. There’s an echo 
of the Baskerville Old Style logo from the 
early 1970s, but engineered for digital per- 
formance. And our team has retained the 
democratic lower-case ‘n’ that has been in 
use for almost 50 years. 

We have also prioritized digital platforms 
by simplifying all Nature-branded jour- 
nal logos. This was a particular challenge 
for journals with very long names, such as 
Nature Structural and Molecular Biology. 
Because social-media channels have grown 
in importance, we have also created a system 
of abbreviated forms for the tiny avatars on 
those platforms. 


Colour 


Perhaps the most radical change to Nature's 
look is the removal of the red bar from the top 
of its web page. The journal design has incor- 
porated red only since the late 1990s. (Before 
that, orange persisted in the logo and printed 
pages for four decades.) Nature’s ‘red period’ 
was intertwined with the rise of the web, but 
as digital design language has matured, red 
is now often associated with unpleasant 
online features such as error messages. More 
importantly, by removing the red, we help 
content to stand out more cleanly. Research 
shows that elimination of unnecessary colour 
elements eases cognitive load. 

All these elements — typography, logos 
and colour — form the basis of Nature’s new 
design language across digital, print and any- 
where you might find us, from coffee table 
to Twitter feed. This language will most cer- 
tainly evolve, driven by researchers’ needs. 


Kelly Krause is Nature’s creative director. 
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Engineer Adriano Olivetti in his typing machine factory in Ivrea, Italy. 


Turbulent birth of the 
personal computer 


The strange circumstances surrounding the invention 
of the world’s first PC are probed by anew book. 
By Sharon Weinberger 
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n the depths of the cold war, an Italian 
industrialist on the cusp of marketing the 
first personal computer dies ona train to 
Switzerland. Adriano Olivetti has had con- 
tact with Western spy agencies; his asso- 
ciates hint that his heart attack might not be 
what it seems. Such is the thriller-esque start 
to biographer Meryle Secrest’s The Mysterious 
Affair at Olivetti. 

At the heart of Secrest’s book lie two ques- 
tions: how did the Italian typewriter company 
Olivetti produce the world’s first PC in the 
1960s — long before its competitors — only 
to have its work fall into obscurity? And 
could Adriano Olivetti’s death be linked to 
the company’s disappearance from computer 
history? Secrest weaves a startling narrative 
around these events, involving a US intelli- 
gence agency and an information-technology 
multinational. 

The story goes back to Camillo Olivetti, 
the Jewish-Italian industrialist who founded 
the company in Ivrea, Piedmont, in 1908. His 
visionary son Adriano, who succeeded him as 
company head in 1938, was interested in archi- 
tecture, politics and technology. He began to 
look beyond typewriters to machines combin- 
ing the best aspects of form and function. More 
crucially, he started to expand from mechani- 
cal typewriters into electronics. 

When the Second World War broke out, 
Adriano Olivetti paid lip service to the fascists 
while secretly working to remove prime minis- 
ter Benito Mussolini, all while keeping his fac- 
tory going and his family alive. He survived the 
war, the company thrived, and he opened an 
electronics laboratory that drew on his expe- 
rience in the United States. In the late 1950s, 
the company created one of the world’s first 
transistorized mainframes, the ELEA 9003. 

Olivetti’s death in 1960 threatened to derail 
the plans he had set out for the company to 
further expand into computers. Moreover, 
the firm was in a downward spiral, following 
his decision in1959 to buy his main competitor, 
the US typewriter firm Underwood. Yet Mario 
Tchou, a key engineer who oversaw the com- 
pany’s electronics work, was already thinking 
about shrinking mainframes into something 
that could sit on a desk. Adriano’s talented 
but less savvy son Roberto oversaw manufac- 
turing of the Programma 101 (P101) desktop 
computer, which made its debut in 1965. It 
was the world’s first PC, and sold an astonish- 
ing 44,000 units over several years, including 
some to NASA. But the company’s computer 
manufacturing was eventually overtaken by its 
competitors, particularly in the United States. 

That sounds like the guts of a great technol- 
ogy history. The book’s subtitle, meanwhile, 
promises spy-versus-spy intrigue involving 
the CIA and US computer giant IBM. However, 
Secrest focuses more on the Olivetti family than 
its products. There is a bare-bones descrip- 
tion of the P101 and howit was developed: the 
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Olivetti’s table-top computer, ‘Programma 101’ in 1966. 


programming system, Secrest notes, “took an 
enormous amount of experimentation’. But 
there is little more on what must be an intrigu- 
ing techie history. 

Secrest thus also misses several opportu- 
nities to tease out intriguing storylines. For 
example, Adriano Olivetti’s insistence that 
something sitting in your office should be 
both functional and beautiful almost certainly 
inspired Apple co-founder Steve Jobs. The aes- 
thetic similarities between Olivetti’s 1960s-era 
showroom on Fifth Avenue in New York City 
and today’s iconic Apple stores are uncanny. 

The book’s treatment of espionage is at 
times more detailed than its take on tech. 
Secrest describes fascinating wartime con- 
tacts between Adriano Olivetti and British 
and US intelligence agencies. While feigning 
loyalty to the Fascist Party, the industrialist was 
secretly meeting with the US Office of Strate- 
gic Services (OSS), the predecessor tothe CIA, 
which dubbed him ‘Agent 660’. There is drama 
inthis. But as Secrest makes clear, Adriano was 
no 007; the OSS never acted on his plans, and 


British intelligence seemed to dismiss himasa 
dreamer and deemed his convoluted scheme 
for toppling Mussolini unrealistic. 

The narrative takes a stranger turn around 
Adriano Olivetti’s death. It seems plausible that, 
saddled with mounting debt and Underwood's 
outdated factories, a58-year-old businessman 
might die of a heart attack. Instead, Secrest 
decides that the CIA murdered Olivetti — as 
well as Tchou, who died ina car accident in 1961. 

Gaining access to CIA records is certainly 
arduous, and Secrest describes her unsuccess- 
ful attempt to meet with the agency's historian, 
David Robarge. Inthe absence of insider insights 
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or access to fresh archival records, she turns to 
acar-repair shop owner in Rockville, Maryland, 
for confirmation of her theory that the CIA engi- 
neered the car accident that killed Tchou. 

The CIA, of course, really has attempted 
to assassinate certain figures, such as Patrice 
Lumumba, the first prime minister of the Dem- 
ocratic Republic of the Congo. But Secrest pre- 
sents no evidence that US spies were involvedin 
Olivetti’s death. She implies that IBM, too, was 
somehow implicated, citing cold-war compe- 
tition and the company’s work for the US gov- 
ernment and intelligence. (She reminds us that 
IBM, as documented by Edwin Black in his 2001 


“Amore interesting historical 
question is why US computer 
science advanced so quickly 
during the cold war.’ 


book JBM and the Holocaust, sold technology 
to the Nazis in the 1930s.) A link between that 
and the Olivetti affair is never aired, however. 

This conspiracy-mongering is a shame. 
Secrest does all the right research, and the clues 
tothe company’s troubles (and Olivetti’s woes) 
are right in front of her. In an era of rampant 
conspiracy theories, suchas bizarre allegations 
involving the Jewish Hungarian-American bil- 
lionaire George Soros, we rely on scholarship 
to pull out the facts, not just the speculation. 

A more interesting historical question is 
why US computer science advanced so quickly 
during the cold war, leaving Europe behind for 
decades. It’s likely that this happened because 
the Pentagon and US intelligence agencies 
invested in companies and technologies that 
had noimmediate commercial prospects, but 
served US strategic interests (see page 481). 
The relationship between spies, soldiers and 
computer scientists during and after the sec- 
ond half of the twentieth century is worthy of 
serious exploration. The Mysterious Affair at 
Olivetti does not offer that. 

Yet this book is, in other ways, a laudable 
attempt. It shines when describing Adriano 
Olivetti’s interest in architecture (Secrest 
authored the 1992 book Frank Lloyd Wright: 
A Biography). Secrest writes well on the aes- 
thetics of Olivetti machines and Adriano’s 
attraction “to clean, boxy lines’, the signature 
of the Bauhaus movement. Her biographer’s 
instinct — choosing a visionary figure whose 
contributions have not been fully appreciated 
—isalsoto be applauded. As she laments, “the 
Programma 101 has not been well served by 
computer historians on or off the Internet.” 
She is right. That record remains to be filled. 


Sharon Weinberger is the author of The 
Imagineers of War: The Untold Story of DARPA, 
the Pentagon Agency That Changed the World. 
e-mail: sharonweinberger@gmail.com 
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our years after the first issue of Nature 
was published, the US National Academy 
of Sciences (NAS) faced an existential 
crisis. In October 1873, one of its original 
members demanded the expulsion of 
another member for swindling. Josiah Whitney, 
California’s state geologist, accused Benjamin 
Silliman Jr, professor of applied chemistry at 
Yale University in New Haven, Connecticut, 
of accepting large sums from California oil 
companies in return for favourable, possi- 
bly fraudulent, science. Silliman responded 
forcefully that company funding for science 
was evidence of responsibility, not miscon- 
duct: companies needed objective “technical 
opinions”. Without science, swindling would 
be more common, he argued. 
NAS president Joseph Henry, secretary of 
the Smithsonian Institution and a former con- 
sultant to Samuel F. B. Morse, inventor of the 
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telegraph, had to agree. If the NAS expelled 
every member who had ever consulted for a 
private company, it would not survive. Henry 
rejected the efforts to remove Silliman. More 
importantly, he resolved to expand the NAS 
membership; new members were to be judged 
onthe basis of their research, not on the source 
of their income’. By the 1870s, it was already 
clear that industry relied on science. 

The Silliman-Whitney controversy markeda 
watershed in the relationship between science 
and industry. For US scientists, as well as many 
in Britain and Europe, private companies had 
become valuable patrons, supplying both funds 
for research and problems to be researched, 
and were gainful employers who provided 
short-term commissions. Likewise, companies 
regarded scientists and their findings as prof- 
itable to the development of their respective 
industries. 
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Over the next 150 years, relations between 
science and industry continued to evolve — in 
four significant stages. Scientists moved from 
part-time consultants to full-time corporate 
researchers, and then to academic entre- 
preneurs. Industry grew from a scattering of 
local businesses to a concentration of large 
companies, and on to multinational corpo- 
rations with global reach. Although these 
transformations might seem symbiotic, and 
even inevitable, the very fact that US scien- 
tists and industries emerged as leaders and 
exemplars (in terms of employment, funding, 
publishing, patenting andinnovating) serves as 
acautionary reminder of the contingent nature 
of such developments. 


Consultancy (1820-80) 


At the heart of the NAS crisis was an essential 
tension in the relations between science and 
industry: can the pursuit of knowledge be 
corrupted by the pursuit of profit? To Whitney 
and his allies, the answer was obviously yes. 
Their ‘pure’ science needed to be practised in 
places protected from the profit motive, such 
as government agencies or well-endowed uni- 
versities. Silliman and supporters of ‘applied’ 
science, by contrast, believed the interactions 
between science and industry to be mutually 
advantageous. Indeed, the emergence of a 
distinct kind of endeavour called applied sci- 
ence characterized a newera in which research 
would address more and more industrial con- 
cerns, and private enterprise would, ideally, 
become a steady supporter of that work’. 

The profession of scientific consulting goes 
back to the early nineteenth century, when indi- 
viduals or groups of capitalists occasionally 
commissioned scientists to examine prospects 
in farming, mining, transportation (canals 
and railroads) and manufacturing. These 
fee-for-expertise engagements were short 
term and advisory. By the 1870s, changes in 
US commercial law (similar to those in British 
and European law) allowed the formation of 
limited-liability, joint-stock companies. These 
businesses, with their large pools of funds and 
numerous shareholders looking for investment 
assurances, regularly consulted scientists. As 
the engagements became both more routine 
(continuous testing and analysing of existing 
products and processes) and more investi- 
gative, scientists began to receive lucrative 
contracts and retainers’. 

In the United States, geologists were 
among the most active consultants during 
the Gilded Age, a period of rapid economic 
growth from the 1870s to the 1890s, especially 
in precious-metal mining in the area west of 
the Mississippi River. In Britain and Germany, 
the most prolific consultants were chemists, 
because of their essential expertise in new prod- 
ucts suchas acids, soaps, paints and especially 
synthetic dyes, including mauve and alizarin. 
Consulting chemists also found themselves in 
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prominent public roles as expert witnesses in 
sensational patent cases. Witness-box quarrel- 
ling among chemists made good newspaper 
copy, and it highlighted profound develop- 
ments in the chemical industries. Changes in 
patent law in the United States, Britain and 
Germany allowed inventors to claim those 
new chemical products and processes as their 
intellectual property (IP) instead of judging 
them to be scientific discoveries, which were, 
by definition, unpatentable. 


Industry (1880-1940) 


At the turn of the twentieth century, the 
independent consulting scientist was replaced 
by the salaried researcher in new industrial 
laboratories. These labs represented the incor- 
poration of applied science; that is, the creation 
of aseparate place within the organization for 
‘research and development’ — a phrase that 
entered the lexicon at this time. 

InGermany, the largest dye companies, such 
as Bayer, Hoechst and BASF, were the first to 
establish dedicated labs for chemical research. 
These were connected to production depart- 
ments, also staffed by university-trained chem- 
ists, and to specialized legal departments, from 
which the new products and processes were 
submitted for patenting. This type of indus- 
trialized invention, with close connections 
between German academic chemistry and 
company labs, was firmly established before 
the First World War’. 

In the United States, the prototype for the 
industrial research lab appeared in the electri- 
calindustry, when inventor Thomas Edison set 
up an ‘invention factory’ in Menlo Park, New 
Jersey, in 1876. Edison wanted to replace what 
had been an unpredictable act of creative genius 
witha regular and reliable system. He recruited 


“Having research in thrall 
toindustry raised the alarm, 
again, that capitalism 
corrupted science.” 


machinists, mechanics, chemists, physicists and 
mathematicians to work on technical problems 
connected to telegraphy and electric lighting. 
Although their efforts were collaborative, only 
the ‘Wizard of Menlo Park’ (the singular inven- 
tor) was listed on more than 1,000 US patents, 
including those for the phonograph (1878) and 
electric light bulb (1880)*. 

The looming expiration of that original light- 
bulb patent and the threat from other lighting 
companies impelled General Electric (GE), the 
corporation that took over Edison’s Electric 
Light Company and all his patents, to estab- 
lish the aptly named Research Laboratory in 
1900 in Schenectady, New York. This proved 
profitable within a decade — commercially, 
with the invention of a new light bulb that 
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restored GE to its dominant market position, 
and professionally, with the recruitment of 
more than 250 engineers and scientists. 

A few other large US corporations followed 
suit and pioneered their own formal research 
and development (R&D) labs — DuPont (1903), 
Westinghouse Electric (1904), American Tele- 
phone and Telegraph (AT&T, 1909) and Eastman 
Kodak (1912). 

It was the First World War and the embargo 
on all German products, especially chemi- 
cals, that was the catalyst to the golden age 
of ‘industrial research’, a neologism of the 
1920s. Between 1919 and 1936, US corporations 
established more than 1,100 labs in nearly all 
industries — petroleum, pharmaceuticals, cars, 
steel — thereby dominating the world’s indus- 
trial research. In 1921, these employed roughly 
3,000 engineers and scientists; by 1940, there 
were more than 27,000 researchers. At the end 
of the Second World War, the figure was nearly 
46,000 (ref. 5). 

This remarkable proliferation reflected 
the massive scale of vertically integrated 
corporations that controlled nearly all areas 
of their respective industries, from natural 
resources through R&D to mass production 
and mass marketing. Industrial research was 
also fuelled by radical changes in US patent law 
that allowed these behemoths to claim the IP 
of their employees. The inventor was now the 
corporation. 

During the Great Depression, critics singled 
out modern big business for its ruinous con- 
sequences to society — unemployment, 
overproduction and bankruptcy. Having 
research in thrall to industry raised the alarm, 
again, that capitalism corrupted science. So cor- 
porate captains and R&D directors marshalled 
the cornucopia of wondrous consumer prod- 
ucts (‘technology ’ inthe new parlance) created 
by their science-based industries. In this story, 
science in industry was good; it guaranteed 
efficacy, efficiency and safety. In words that 
nineteenth-century consulting scientists would 
have understood, consumers could trust these 
modern technologies (and their corporations) 
because of the R&D. 

At the World’s Fair in New York City in 1939, 
industry paraded the fruits of its science. 
The Radio Corporation of America (RCA) 
introduced consumers to the television. 
International Business Machines (IBM) showed 
offits electric typewriter. GE exhibited its new 
electrical refrigeration system, and DuPont, 
under its banner “Better Things for Better Liv- 
ing through Chemistry”, showcased a synthetic 
fibre called nylon®. 

Fears of corporate corruption of science were 
put to rest by awards of the Nobel prize. In 1931, 
two Germans, Carl Bosch and Friedrich Bergius, 
became the first industrial researchers to winin 
chemistry. The next year, GE’s Irving Langmuir 
won the chemistry prize, and in 1937, Clinton). 
Davisson of Bell Telephone Laboratories (Bell 
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US firms paraded the fruits of their industrial research at the 1939 World's Fair in New York City. 


Labs) won a share of the Nobel Prize in Physics. 

The largest research facility in the United 
States was Bell Labs, established in 1925 in New 
York City to consolidate the R&D arm of AT&T 
and Western Electric, its telephone-manufac- 
turing arm. The labs had around 3,600 staff 
members and a budget in excess of US$12 mil- 
lion. (GE allocated less than $2 million to its 
Research Laboratory.) The first president of 
Bell Labs was the physicist Frank Jewett. In 1939, 
he became the first industrial scientist to be 
president of the NAS’. 

Inshort, national standing and international 
acclaim seemed to confirm that science done 
under the auspices of industry was equal to 
science in universities or governments. Still, 
industrial labs of the 1920s and 1930s were not 
simply universities without students. As insti- 
tutions of applied science, they always needed 
to showcorporate headquarters their value in 
terms of profitable products and processes. 


Military (1940-80) 
By the time the New York World's Fair closed in 
October 1940, Europe was already at war. The 
United States entered in December 1941, and 
the Second World War transformed the rela- 
tionship between science and industry, along 
with the very terms — and even the history — of 
those relations. 

The prime mover inall those changes was the 
US military and the unprecedented amounts 


of money it allocated — through new forms of 
contracting and subcontracting — to scientific 
research. During the war, the Office of Scientific 
Research and Development, under its director 
Vannevar Bush, signed more than 2,300 research 
contracts, worth roughly $350 million, with more 
than 140 academic institutions and 320 com- 
panies. About two-thirds of that funding went 
to universities; the Massachusetts Institute of 
Technology (MIT) in Cambridge, for example, 
received more than $200 million for its Radia- 
tionLaboratory for research on radar. Corporate 
R&Dalso received unrivalled amounts: AT&T was 
allocated $16 million, GE $8 million and RCA, 
DuPont and Westinghouse between $5 million 
and $6 million each*. 

But by far the most prodigious investments in 
R&D flowed from the War Department ($800 mil- 
lion) and the Navy Department ($400 million). 
The largest portion of that went to private 
industry ($800 million), much of it directed 
towards emergent industries with compel- 
ling national-security interests — for example, 
aerospace, electronics, computing and nuclear 
technology’. 

The US military had not intended to become 
the commander-in-chief of US science, but bythe 
end of the war it was apparent, at least to Bush, 
that the federal government needed a plan. In 
his 1945 reportto US president Franklin D. Roo- 
sevelt, Science — The Endless Frontier, Bush pre- 
sented a vision for US science policy that would 


© 2019 Springer Nature Limited. All rights reserved. 


guide and define both university science and 
corporate R&D throughout the cold war. The 
endless frontier was ‘basic’ research, the kind 
performed “without thought of practical ends”, 
a direct throwback to the nineteenth-century 
idea of pure science. The US military would fund 
this to boost industrial research because, the 
reasoning went, basic research was “the pace- 
maker of technological progress”. 

Here, then, was a new argument. As many 
commentators at the time and since have 
pointed out, it did not reflect either the 
experience of the war years (during which 
multifunctional teams worked on military 
projects such as the atomic bomb or radar) or 
of the previous decades (in which multifunc- 
tional teams worked in R&D labs on corporate 
projects such as the light bulb). Science — The 
Endless Frontier thus propounded a different 
idea for developing new technologies, both 
military and commercial. Intime, this became 
known as the linear model of innovation’. 

The theory posits a conveyor belt, beginning 
with basic science and moving smoothly along 
to development, then to manufacturing and 
production, and culminating with technology 
or innovation. Increase the amount of basic sci- 
ence and the (alleged) result would be more 
technology, innovation and overall economic 
growth. Theoretically, basic research was to be 
centred in universities (and military funding 
did transform US universities and their science 
departments accordingly). But corporate R&D 
labs were also contracting with the military, 
as they had been during the war. With these 
military contracts, as well as enlarged funding 
from corporate headquarters (business leaders 
also bought into the linear model), industrial 
labs were redirected away from applied science 
and towards basic research”. 

Such faith in endless scientific innovation 
combined with prodigious financial resources 
led to the creation of central corporate 
research labs. These functioned more or less 
independently, which nicely suited the new 
organizational structure of multinationals. In 
place of vertical integration, sprawling con- 
glomerates adopted horizontal organizational 
structures comprising multiple divisions (the 
M-form organization), in which each division, 
including the central research lab, operated 
onits own. 

Leading research labs relocated to the 
countryside, far removed from headquarters 
and any connection to manufacturing. RCA 
Laboratories Division, for example, expanded 
its campus near Princeton, New Jersey, after 
1945 and started work on colour TV and semi- 
conductors. In1956, Westinghouse built up its 
research labs in Churchill outside Pittsburgh, 
Pennsylvania, for nuclear research. IBM set 
up its Thomas J. Watson Research Center, 
designed by the modernist architect Eero 
Saarinen, in Yorktown Heights near New York 
City in1961, to work onlasers, semiconductors 
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and other computer-related physics. And Bell 
Labs moved its research headquarters to 
Murray Hill, NewJersey. 

At its height (before 2001), Bell Labs con- 
ducted world-class research in many fields 
(physics, mathematics, radio astronomy) at 
numerous sites. Its largest campus at Naperville 
near Chicago, Illinois, employed 11,000 people. 
The 191-hectare flagship campus at Holmdel, 
NewJersey, some 30 kilometres south of New 
York City, included a magnificent mirrored-glass 
building also designed by Saarinen in 1962. 

These ‘industrial Versailles’ did research 
without much development; they had indeed 
been converted into universities without 
students”. As industrial ivory towers, they 
hoovered up university faculty members 
and PhD scientists and engineers, promising 
them time and resources to pursue their own 
agendas, and offering them open publication 
policies that allowed their results to appear 
in the most prestigious journals. By the mid- 
1950s at RCA in Princeton, half of the staffwere 
theoretical scientists and more than 75% of the 
contracts were with the military. DuPont, like- 
wise, increased its scientific staff by 150% in the 
decade after the war, with the greatest growth 
in fundamental chemistry being at its Experi- 
mental Station near Wilmington, Delaware. By 
the early 1960s, the number of engineers and 
scientists employed in US industrial research 
topped 300,000 (ref. 12). 

These leading corporate laboratories — 
Bell Labs, IBM, Westinghouse, DuPont, RCA 
(Princeton), Xerox Palo Alto Research Center 
(PARC, 1970) — became powerhouses of basic 
science. Between 1956 and 1987, 12 corporate 
scientists won Nobel prizes. Bell Labs alone has 
collected eight in physics and one in chemistry 
since the Second World War, including one for 
its most famous technology, the transistor, in 
1956. In the early 1960s, corporate research- 
ers authored 70% of papers appearing in Phys- 
ics Abstracts. By 1980, Xerox PARC matched 
the world’s leading universities on citation 
impact®®. 

With its emphasis on basic science as the 
necessary prerequisite to any future tech- 
nological progress, the linear model was a 
break with the past. It prompted a new inter- 
pretation of the historical relations of science 
and industry. In the 1950s and 1960s, econo- 
mists, historians and other scholars began to 
re-examine the latter half of the nineteenth 
century, and claimed to have discovered a 
‘Second Industrial Revolution’. Character- 
ized by the chemical and electrical industries, 
this revolution involved replacing the old 
trial-and-error methods of invention used 
in the dirty industries of the ‘First Industrial 
Revolution’ (textile factories, coal mines and 
iron foundries) with science-based methods. 
Inthis revisionist history, glamorous synthetic 
dyes and bright electric bulbs sprang directly 
fromthe pure science of organic chemistry and 
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electromagnetic physics. History thus seemed 

to provide definitive evidence for the necessity 

of continued funding of basic science, as well 

as a ready explanation for why US and West- 

ern European corporations had dominated 

the world’s economy for more thanacentury”. 
It was not to last. 


Outsourcing (1980 on) 


Corporate investment in basic science had 
been sustained by dominant positions in inter- 
national markets. AT&T, DuPont, IBM, Kodak 
and Xerox held more than 80% market shares 
in their respective core businesses. Then the oil 
shocks of the 1970s, combined with widespread 
stagflation (high inflation, slow growth), weak- 
ened the US and European economies. Global 
competition increased, especially from Japa- 
nese and South Korean firms. In the early 1980s, 
growing free trade squeezed profit margins 
even further. 

In response, US corporations began to 
restructure and downsize. Business leaders and 
shareholders decided that the multi-division 
conglomerate had become too unwieldy 
to compete. A new, leaner corporation was 
required. One way to restructure was out- 
sourcing, replacing internal suppliers with 
external ones. Corporations began to relocate 
their manufacturing, once the backbone of the 
industrial economy, to plants in lower-cost and 
less-regulated countries. (The pace has only 
accelerated, especially after 2001, when China 
joined the World Trade Organization.) 

Another way to downsize was divestiture, 
selling off subsidiaries unrelated to the core 
business. To shareholders seeking quick prof- 
its, long-term corporate research looked like 
a financial liability. The central laboratory 
became a prime target. In 1988, RCA sold off 


its Princeton lab as an independent business, 
Sarnoff Corporation. 1n1993, IBM slashed $1 bil- 
lion — roughly 20% — from its R&D budget. The 
German corporation Siemens bought Westing- 
house’s Churchill laboratory in 1997, and in 2002, 
PARC, the former division of Xerox, became an 
independent company. In1996, AT&T, following 
the break-up of its phone monopoly, spun off 
the vaunted Bell Labs as aseparate company, 
Lucent Technologies (in 2016 this was taken 
over by Nokia, the Finnish telecommunications 
company). The Holmdel campus closed in2007. 
Within a year, just four scientists remained at 
Murray Hill doing fundamental physics research. 
It was the end of an era™. 

Accompanying globalized competitive 
markets, liberalized free trade and shareholder 
short-termism, the US military began to cut 
back funding for basic science at corporate 
labs. With the exception of a few years in the 
early 1980s (US president Ronald Reagan’s 
Strategic Defense Initiative, the ‘Star Wars’ 
programme), the US government steadily 
reallocated research funds to universities and 
other non-profit organizations, particularly 
towards medical schools and research hospitals 
through the National Institutes of Health (NIH). 
With continuous funding, new fields (molecular 
biology, biochemistry and biotechnology, for 
instance) surged past the diminished physi- 
cal sciences. By 1988, only about 10% of basic 
research articles in physics were authored by 
industrial scientists; by 2005, the number had 
plummeted to less than 3% (ref. 15). 

The demise of the corporate research lab 
heralded the death of the linear-model idea. 
Many scholars concluded that it was too sim- 
plistic. The pathway from science to technology 
was neither straight nor singular, and perhaps 
not even one way (technological advances can 


a ae = 


Bell Labs in the 1990s: a researcher testing data transmission through fibre-optic cable. 
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also lead to scientific discoveries). For corporate 
executives, investment in basic science did not 
seem to pay off. DuPont discovered no new 
nylons; Kodak failed to produce a revolution 
in photography; RCA lost its edge in consumer 
electronics; IBM ignored the personal com- 
puter; and Xerox PARC let slip the graphical 
user interface. 

Inthe late 1960s and 1970s, small firms such 
as Intel, Microsoft, Apple, Sun Microsystems 
and Cisco Systems did commercialize the basic 
research being done at the larger corporations. 
Without establishing traditional research labs 
of their own, these players came to dominate 
the new information technology (IT) industry. 
In 1991, for example, when Microsoft created 
Microsoft Research — one of the largest indus- 
trial labs of its generation — its declared mis- 
sion was not basic science, but innovation. In 
amore extreme case, Apple co-founder Steve 
Jobs shut down a fledgling research lab in1998 
inthe belief that innovation would not require 
any investment in R&D. 

Until 2010 and the emergence of machine 
learning, artificial intelligence (Al) and the 
Internet of Things, most technology compa- 
nies ignored basic research. In 2012, following 
Jobs’s death, Apple began investing in R&D 
again, particularly in Al. Likewise, Amazon, 
Google, Facebook and Uber began to recruit 
Alresearchers from academia. This brain drain 
has become so serious that universities have 
begun to worry about their ability to train 
future Al researchers. 

Twenty-first-century corporations value 
science (particularly, patentable discoveries) 
and still think that basic research can lead to 
invention and innovation. They would just 
prefer that someone else do it (and pay for 
it). In business terms, they optimize their 
‘supply-chain management’, a phrase that 
gained currency in the 1990s, by replacing 
stable in-house labs (warehouses of scientists 
and engineers) with flexible contract research. 
Their ability to do so was greatly facilitated 
by the US government and the loosening of 
antitrust enforcement. The settlement of the 
monopoly case against Microsoft in 2001, for 
example, stands in stark contrast to the forced 
break-up of AT&T in 1984. 

Moreover, the US government now permitted 
innovative start-ups to acquire new technolo- 
gies, patents and licences from other companies 
and independent non-profit organizations such 
as Sarnoff and PARC, and to engage in exten- 
sive collaborative research with institutes and 
universities. Microsoft Research, for instance, 
now has labs around the globe (New York City, 
Beijing, Bangalore) and on several university 
campuses (MIT, the University of California, 
Santa Barbara, and Cambridge, UK), which 
account for 20% of patents in Al worldwide. 
Google, by contrast, mostly underwrites aca- 
demic research through grants, fellowships, 
internships and visiting positions. 


Universities have traditionally been the home 
of basic science. Inthe twenty-first century they 
have also become the source of innovation and 
entrepreneurship, in part because of sweep- 
ing changes in US patent law. In 1980, the US 
Supreme Court (in Diamond v. Chakrabarty) 
significantly expanded what could be patented 
toinclude new life forms. That same year, the US 
Congress passed the Bayh-Dole Act, permitting 
universities to patent the results of research 
funded by the NIH or other federal agencies 
and conducted on their campuses by faculty 
members, students and employees. Universi- 
ties started filing for patents at an increasing 
rate — from 2,266 in 1996 to 5,990 in 2014. The 
university is now aninventor’®. 

The most prominent industry that has been 
transformed by these legal and policy changes 
has been biotechnology. In 1976, a university 
biochemist and a venture capitalist founded 
Genentech, the first biotech firm. Genentech 
focused, as did other biotech start-ups (Amgen 
in 1980 and Genzyme in 1981), on translating 


“The pathway from science 
totechnology was neither 
straight nor singular, and 
perhaps not even one way.” 


basic science done in universities and, sub- 
sequently, in-house into patents and other 
forms of profitable IP. They facilitated that 
linear movement from research to develop- 
ment. Further commercialization towards 
the manufacture and distribution of drugs 
and therapies was taken up by traditional big 
pharmaceutical corporations. Eli Lilly (founded 
in1876), for example, guided Genentech’s first 
drug (synthetic human insulin) through clinical 
trials and brought it to market”. 

The emergence of biotech represented both 
anew business plan (entrepreneurial scientists 
partnering with venture capitalists to sell their 
research) and a new model of innovation. 
Here, industry shifted from a single internal 
or closed source of research to multiple exter- 
nal or open sources”. In this model, academic 
entrepreneurs, commercialized universities, 
globalized contract-research institutes and 
numerous small research start-ups supply the 
science and the IP. Larger, more established 
firms then develop and commercialize these 
into new products and processes. 

According to some economists and busi- 
ness scholars, open innovation character- 
izes a ‘Third Industrial Revolution’’. From 
their perspective, the university professor 
seeking to patent the results of federally 
funded research to form a start-up, with seed 
money from venture capitalists, is the direct 
descendant of the consulting chemist of the 
nineteenth century. In this ecosystem, a pop- 
ulation of nimble researchers and small firms 
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has displaced a pack of lumbering corporate 
labs”. To critics and less-sanguine academics, 
the twenty-first-century relations of science 
and industry illustrate the commodification 
of university research and the corruption of 
the pursuit of knowledge by the profit motive”. 

Today, a complex innovation web has 
replaced the old conveyor belt. This is another 
new model — global commercialization. 
Supply-chain science is premised on the belief 
that research is a fungible commodity to be 
bought on demand and sold by the lowest-cost 
lab. Insome ways, twenty-first-century contract 
research is reminiscent of nineteenth-century 
consulting science. In both cases, the question 
remains: is marketplace science trustworthy? 
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Readers respond 


Correspondence 


CAS: with the people 
and the government 


Your Editorial marking 

the 70th anniversary of the 
Chinese Academy of Sciences 
(CAS) points out some of 

the remarkable results the 
organization has achieved 
since its inception (see Nature 
574, 5;2019). However, we find 
your take onits history quite 
misleading. 

For example, you seem 
to overplay the modern 
significance of China’s Cultural 
Revolution. The devastating 
consequences of that have 
been recognized by the Chinese 
government for more than 
40 years, andit has painstakingly 
implemented measures to 
reverse the negative effects. This 
great governance has ensured 
that China has witnessed huge 
advances ever since. 

CAS is not run independently 
of government, as you imply. The 
establishment and development 
of CAS have been entirely based 
onthe wisdom and support of 
the central government. The 
role of the academy in leading 
China’s research has always 
been recognized by China’s 
leadership, which has respected 
science and technology from the 
start — for its own sake as well 
as for developing a sustainable 
economy. 

Contrary to your headline, 
CAS has never sought or 
achieved financial autonomy. 
Over the past 40 years, half of 
its income has come directly 
from central-government 
investment; the rest has been 
from competitive funding 
or technology transfer. CAS 
could not develop without the 
funding and support of the 
central government. And CAS 
is committed to facilitating 
technology transfer to support 
economic development, 
although it does not directly 
invest in the industrial sector. 

The academy has a list of 


notable achievements, apart 
from those you mention. It 
started China’s first talent 
programme, attracting top- 
quality overseas-trained 
scholars back to China. And CAS 
intends to become a leading 
research institution that satisfies 
scientific interests and regional 
or global needs. We have already 
established 10 joint research 
and education centres overseas 
and, together with another 
36 science organizations, 
have launched the Alliance 
of International Science 
Organizations to address shared 
challenges and to contribute to 
the United Nations Sustainable 
Development Goals. 

You suggest that CAS could be 
a model for science academies 
in other countries — particularly 
in one-party states or those 
with authoritarian leadership. 
Our core competence lies in 
our unique role as a national 
research institution. Although 
every academy should of course 
determine its own development, 
we find that an integrated 
structure combining research, 
education, consultation and 
technology transfer suits us well. 

We object to your allegation 
that the Chinese central 
government takes “harsh 
measures against its people”. 
In carrying out its scientific 
and technical mission, CAS 
stands firmly with the central 
government and with the 
people. We reject any such false 
allegations with disruptive 
intentions and are strongly 
opposed to biased judgments 
of China’s internal affairs, and to 
any unnatural linking of political 
or ideological positions with our 
mission. 


Qingquan Zhang Bureau of 
International Cooperation, 
Chinese Academy of Sciences, 
Beijing, China. 
qqzhang@cashq.ac.cn 
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Italy’s evaluators: 
rankings boom is real 


As president and vice-president 
of the Italian National Agency 
for the Evaluation of Universities 
and Research Institutes 
(ANVUR), we disagree that 

Italy has been climbing the 
international research-impact 
rankings because Italian scholars 
are citing each other’s articles 
more heavily (Nature 572, 
578-579; 2019). 

Scientific productivity in Italy 
has risen in the past decade, 
possibly stimulated by the 
introduction of performance- 
related university funding. 

Such systems tend to increase 
acountry’s publications in the 
short term, as well as to boost 
the number of citations per 
paper when normalized for each 
field. The use of metrics can itself 
have positive effects on scientific 
output (see D. Checchi et al. 
High. Educ. Q. 73, 45-69; 2019). 

ANVUR recognizes the 
importance of correcting 
gaming behaviour, including 
self-citation. In our most recent 
evaluation exercise (in 2011-14), 
papers in which self-citation 
exceeded a given threshold were 
downgraded. We intend to seek 
evidence of gaming behaviour at 
the individual and article level, 
and clamp down onit in future 
evaluations if necessary. 

The Italian research system 
has responded to public 
demand for more transparency 
and accountability. Citation 
doping alone cannot explain the 
concomitant rise in publications 
and citations (see also 
P. D’Antuono and M. Ciavarella 
Nature 574, 333; 2019). The 
rise should in fact be viewed 
with some pride by the Italian 
scientific community. 


Paolo Miccoli, Raffaella I. 
Rumiati ANVUR, Rome, Italy. 
raffaella.rumiati@anvur.it 
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China’s silicon valley 
must protect nature 


We welcome the development 
of Xi’an — China’s former capital 
and the original eastern end of 
the Silk Road — into a high-tech 
city at the heart of the country’s 
2013 Belt and Road Initiative 
for worldwide trade (Nature 
563, S25-S27; 2018). However, 
it is crucial that the ambitious 
infrastructure planning includes 
provisions to protect the city’s 
environment from further 
degradation. 

The nearby Qinling Mountains 
provide 90% of the drinking 
water for the 10 million or 
so residents of Xi’an. The 
range hosts 4,000 plant and 
animal species, and contains 
15 natural and cultural heritage 
sites of ancient civilizations 
going back 5,000 years to 
the Xia dynasty. Developing 
China’s ‘silicon valley’ so close 
to these mountains could 
seriously disrupt the ecosystem 
(S. Thacker et al. Nature Sustain. 
2,324-331; 2019). 

Xian is already one of China’s 
most polluted cities, with many 
outdated coal-burning factories. 
Only half of the city’s 15 rivers 
are classified as clean. The daily 
discharge of domestic sewage 
into these rivers can reach 
8,000 tonnes. The Zao River, 
which crosses the city’s Hi-tech 
Industries Development Zone, is 
black and malodorous. 

All these issues need to be 
addressed before major changes 
associated with the development 
go ahead (see, for example, 

L. Han etal. Sci. Rep. 6, 23604; 
2016). 
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Quantum information 


Quantum computing 


takes flight 


William D. Oliver 


A programmable quantum computer has been reported to 
outperform the most powerful conventional computers in 
a specific task — a milestone in computing comparable in 
importance tothe Wright brothers first flights. See p.505 


Quantum computers promise to perform 
certain tasks much faster than ordinary 
(classical) computers. In essence, a quan- 
tum computer carefully orchestrates 
quantum effects (superposition, entanglement 
and interference) to explore a huge compu- 
tational space and ultimately converge ona 
solution, or solutions, to a problem. If the 
numbers of quantum bits (qubits) and oper- 
ations reach even modest levels, carrying out 
the same task on a state-of-the-art supercom- 
puter becomes intractable on any reasonable 
timescale — aregime termed quantum compu- 
tational supremacy’. However, reaching this 
regime requires a robust quantum processor, 
because each additional imperfect operation 
incessantly chips away at overall performance. 
It has therefore been questioned whether a suf- 
ficiently large quantum computer could ever 
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Figure 1| Three types of quantum circuit. Arute et al.2 demonstrate that aquantum 
processor containing 53 quantum bits (qubits) and 86 couplers (links between 
qubits) can complete a specific task much faster than an ordinary computer can 
simulate the same task. Their demonstration is based on three quantum circuits: 
the full circuit, the patch circuit and the elided circuit. The full circuit comprises 


be controlled in practice. But now, on page 505, 
Arute et al.’ report quantum supremacy using 
a53-qubit processor. 

Arute and colleagues chose a task that is 
related to random-number generation: namely, 
sampling the output of a pseudo-random 
quantum circuit. This task is implemented 
by asequence of operational cycles, each of 
which applies operations called gates to every 
qubit in ann-qubit processor. These operations 
include randomly selected single-qubit gates 
and prescribed two-qubit gates. The output 
is then determined by measuring each qubit. 

The resulting strings of Os and 1s are 
not uniformly distributed over all 2” possi- 
bilities. Instead, they have a preferential, 
circuit-dependent structure — with certain 
strings being much more likely than others 
because of quantum entanglement and 
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quantum interference. Repeating the 
experiment and sampling a sufficiently large 
number of these solutions results ina distribu- 
tion of likely outcomes. Simulating this prob- 
ability distribution on a classical computer 
using even today’s leading algorithms becomes 
exponentially more challenging as the number 
of qubits and operational cycles is increased. 

In their experiment, Arute et al. used a 
quantum processor dubbed Sycamore. This 
processor comprises 53 individually controlla- 
ble qubits, 86 couplers (links between qubits) 
that are used to turn nearest-neighbour two- 
qubit interactions on or off, and ascheme to 
measure all of the qubits simultaneously. In 
addition, the authors used 277 digital-to-analog 
converter devices to control the processor. 

When all the qubits were operated simul- 
taneously, each single-qubit and two-qubit 
gate had approximately 99-99.9% fidelity —a 
measure of how similar an actual outcome of an 
operation is to the ideal outcome. The attain- 
ment of such fidelities is one of the remarkable 
technical achievements that enabled this work. 
Arute and colleagues determined the fideli- 
ties using a protocol known as cross-entropy 
benchmarking (XEB). This protocol was intro- 
duced last year’ and offers certain advantages 
over other methods for diagnosing systematic 
and randomerrors. 

The authors’ demonstration of quantum 
supremacy involved sampling the solutions 
from a pseudo-random circuit imple- 
mented on Sycamore and then comparing 
these results to simulations performed 
on several powerful classical computers, 
including the Summit supercomputer at 
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all 53 qubits and is the hardest to simulate on an ordinary computer. The patch 
circuit cuts the full circuit into two patches that are each relatively easy to simulate. 
Finally, the elided circuit links these two patches using a reduced number of two- 
qubit operations along reintroduced two-qubit connections and is intermediate 
between the full and patch circuits, in terms of its ease of simulation. 
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Oak Ridge National Laboratory in Tennessee 
(see go.nature.com/35zfbuu). Summit is cur- 
rently the world’s leading supercomputer, 
capable of carrying out about 200 million 
billion operations per second. It comprises 
roughly 40,000 processor units, each of which 
contains billions of transistors (electronic 
switches), and has 250 million gigabytes of stor- 
age. Approximately 99% of Summit’s resources 
were used to perform the classical sampling. 

Verifying quantum supremacy for the sam- 
pling problem is challenging, because this is 
precisely the regime in which classical simu- 
lations are infeasible. To address this issue, 
Arute etal. first carried out experiments ina 
classically verifiable regime using three differ- 
ent circuits: the full circuit, the patch circuit and 
the elided circuit (Fig. 1). The full circuit used 
all n qubits and was the hardest to simulate. 
The patch circuit cut the full circuit into two 
patches that each had about n/2 qubits and 
were individually much easier to simulate. 
Finally, the elided circuit made limited two- 
qubit connections between the two patches, 
resulting in a level of computational difficulty 
that is intermediate between those of the full 
circuit and the patch circuit. 

The authors selected a simplified set of two- 
qubit gates and a limited number of cycles (14) 
to produce full, patch and elided circuits that 
could be simulated in a reasonable amount of 
time. Crucially, the classical simulations for 
allthree circuits yielded consistent XEB fideli- 
ties for up ton=53 qubits, providing evidence 
that the patch andelided circuits serve as good 
proxies for the full circuit. The simulations of 
the full circuit also matched calculations that 
were based solely on the individual fideli- 
ties of the single-qubit and two-qubit gates. 
This finding indicates that errors remain well 
described by asimple, localized model, evenas 
the number of qubits and operations increases. 

Arute and colleagues’ longest, directly 
verifiable measurement was performed on 
the full circuit (containing 53 qubits) over 
14 cycles. The quantum processor took one 
million samples in 200 seconds to reach an 
XEB fidelity of 0.8% (with a sensitivity limit of 
roughly 0.1% owing to the sampling statistics). 
By comparison, performing the sampling task 
at 0.8% fidelity on a classical computer (con- 
taining about one million processor cores) took 
130 seconds, and a precise classical verification 
(100% fidelity) took 5 hours. Given the immense 
disparity in physical resources, these results 
already show a clear advantage of quantum 
hardware over its classical counterpart. 

The authors then extended the circuits into 
the not-directly-verifiable supremacy regime. 
They used a broader set of two-qubit gates 
to spread entanglement more widely across 
the full 53-qubit processor and increased the 
number of cycles from 14 to 20. The full circuit 
could not be simulated or directly verified in 
a reasonable amount of time, so Arute et al. 


488 | Nature | Vol 574 | 24 October 2019 


simply archived these quantum data for future 
reference — incase extremely efficient classical 
algorithms are one day discovered that would 
enable verification. However, the patch-circuit, 
elided-circuit and calculated XEB fidelities all 
remained in agreement. When 53 qubits were 
operating over 20 cycles, the XEB fidelity cal- 
culated using these proxies remained greater 
than 0.1%. Sycamore sampled the solutionsina 
mere 200 seconds, whereas classical sampling 
at 0.1% fidelity would take 10,000 years, and full 
verification would take several million years. 
This demonstration of quantum supremacy 
over today’s leading classical algorithms on 
the world’s fastest supercomputers is truly 
a remarkable achievement and a milestone 
for quantum computing. It experimentally 
suggests that quantum computers represent 
amodel of computing that is fundamentally dif- 
ferent from that of classical computers’. It also 
further combats criticisms** about the control- 
lability and viability of quantum computation 
inan extraordinarily large computational space 
(containing at least the 2® states used here). 
However, much work is needed before quan- 
tum computers become a practical reality. In 
particular, algorithms will have to be developed 
that can be commercialized and operate on 
the noisy (error-prone) intermediate-scale 
quantum processors that will be available in 
the near term’. And researchers will need to 
demonstrate robust protocols for quantum 


Neuroscience 


error correction that will enable sustained, 
fault-tolerant operation in the longer term. 

Arute and colleagues’ demonstration is in 
many ways reminiscent of the Wright brothers’ 
first flights. Their aeroplane, the Wright Flyer, 
wasn't the first airborne vehicle to fly, and it 
didn’t solve any pressing transport problem. 
Nor did it herald the widespread adoption of 
planes or mark the beginning of the end for 
other modes of transport. Instead, the event 
is remembered for having shown a new oper- 
ational regime — the self-propelled flight of an 
aircraft that was heavier than air. It is what the 
event represented, rather than what it practi- 
cally accomplished, that was paramount. 
And so it is with this first report of quantum 
computational supremacy. 
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Gut microbes help mice 


forget their fear 


Drew D. Kiraly 


Microorganisms in the gut influence fear-related learning. 
The results of a study that reveals some of the mechanistic 
underpinnings of this phenomenon promise to boost our 
understanding of gut—brain communication. See p.543 


The gut’s resident bacteria, collectively called 
the gut microbiota, can have marked effects 
on brain function and on behaviour — but the 
mechanisms underlying this interplay remain 
largely unknown. On page 543, Chu et al.’ 
define these mechanisms in unprecedented 
scope and detail. The authors report that 
mice lacking a complex microbiota exhibit 
altered fear-associated behaviour, changes in 
gene expression in cells in the brain, and altera- 
tions inthe firing patterns and rewiring ability 
of neurons. The work represents a leap forward 
in our understanding of the interplay between 
the gut and brain. 

Animals update their responses to 
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environmental cues throughout their lives. 
This process of behavioural adaptation is 
driven by underlying cellular and molecular 
changes in the brain. Chu and colleagues ana- 
lysed how changes in the gut microbiota affect 
one such adaptation: fear conditioning. 

First, the authors trained mice to associate 
atone with an electric shock, and measured 
howstrongly that association was formed. The 
association developed normally bothin control 
animals and in animals that had been treated 
with antibiotics to deplete their gut microbiota. 
The researchers then performed an extinction 
task, in which they repeatedly played the tone 
without an electric shock before measuring 


the rate at which the animals updated their 
behaviour (such an update indicates that the 
fear response has been extinguished). The 
microbiota-deficient mice were unable to 
update their response, and showed persis- 
tent fearful behaviour long after control ani- 
mals had adapted. Chu et al. found the same 
phenomenon in mice that had been raised 
germ-free in sterile isolators and so had never 
developed a gut microbiota. 

The current study is not the first to examine 
the effects of the microbiota on fear condition- 
ing — previous work has shown a decrease in 
the acquisition of this response in germ-free 
mice compared with controls’. But Chu and 
colleagues are the first to report a specific 
deficit in fear extinction (Fig. 1). What truly 
sets their work apart, however, is the breadth 
and depth of the mechanistic findings that they 
subsequently went onto gather. 

Extinction of the fear response is heavily 
dependent on the function of the brain’s pre- 
frontal cortex*. Chu et al. performed in vivo 
imaging of this brain region in their animals 
to analyse both neuronal activity patterns and 
the formation and elimination of structures 
called dendritic spines, which are involved in 
the formation of synaptic connections between 
neurons. During the fear-extinction test, 
control animals showed less dendritic-spine 
elimination and more spine formation than 
did microbiota-deficient animals. The ability 
to create synapses and to maintain appropri- 
ate existing synapses is a key part of synaptic 
plasticity — a process crucial to learning and 
memory, in which the strength of synaptic 
connections changes in response to changes 
inneuronal activity. A higher ratio of spine for- 
mation to elimination might therefore partially 
explain why control animals were better able to 
appropriately extinguish the fearful stimulus. 

Tight control of gene expression is also 
crucial for proper regulation of synaptic and 
behavioural plasticity. Previous work has indi- 
cated that changes in the microbiota alter the 
gene-expression profile of the prefrontal cortex 
as a whole’, but Chu and colleagues performed 
RNA sequencing on single cells throughout the 
region, enabling them to identify gene-expres- 
sion changes in individual cell types. These data 
show that microbiota depletion has a more pro- 
nounced effect on excitatory than on inhibitory 
neurons, setting the stage for future research 
in which the microbiota could be targeted to 
alter the characteristics of specific neuronal 
populations. 

The authors’ single-cell sequencing also 
reveals gene-expression changes in microglia, 
the brain’s resident immune cells. Previous stud- 
ies®’ have shown that altering the microbiota 
causes changes in microglial gene expres- 
sion and function. Chu and colleagues found 
high expression of genes associated with an 
immature state in the microglia of their micro- 
biota-deficient animals — a change that might 
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Figure 1| Multiple effects of the gut microbiota on the brain. The gut’s resident bacteria, the microbiota, 
can markedly affect the brain and behaviour. Chu et al.’ provide evidence that the microbiota is needed for 
mice to update their behaviour in response to changing environmental cues — for example, to stop reacting 
toaonce-frightening stimulus when it is no longer threatening (a phenomenon called fear extinction). 

The authors hypothesize that this role in behavioural adaptation involves metabolite molecules that are 
produced by the microbiota and circulate in the blood. They suggest that the metabolites modulate the 
ability of the brain’s immune cells, microglia, to engulf and degrade structures called dendritic spines 

that form synaptic connections between neurons. In addition, microglia could affect neuronal activity 
directly — together, these activities would promote behavioural adaptation. In support of this idea, the 
researchers show that changes in the microbiota lead to altered gene expression in microglia and neurons, 


and to changes in dendritic-spine maintenance. 


affect the cells’ ability to function normally. 

In the past decade, it has become clear 
that microglia have a crucial role in synaptic 
connectivity. By engulfing and degrading 
unwanted synapses, the cells ensure that neu- 
ronal connections are pruned or maintained 
as needed®. Changes in this process can alter 
neurodevelopment? and are implicated in 
psychiatric disease’. The researchers’ RNA 
sequencing revealed changes in genes related 
tothe role of the microglia in synapse organiza- 
tion and assembly. Although Chu etal. did not 
directly assess changes in the engulfment of 
synapses, their results lay the groundwork for 
future research into how interactions between 
the microbiota and microglia affect synapse 
density in the brain. 

Finally, Chu and colleagues profiled gut 
metabolites (the molecules produced from 
metabolic processes) to identify molecules that 
might drive the gut-brain interactions they had 
observed. The authors found four metabolites 
that were significantly less abundant in microbi- 
ota-deficient mice than in controls. They there- 
fore posit that the microbiota affects neurons 
and microglia inthe brain through metabolites 
that are released into the circulation. 

The gut microbiota is highly metabolically 
active, and the theory that the gut and brain 
communicate through circulating micro- 
biota-derived metabolites is a popular one”. 
Manipulations of microbial metabolites have 
been shown to affect a range of behaviours, 
from autism-like actions” to those involving 
reward-seeking for drugs”. Experiments that 
manipulate levels of the metabolites identified 


© 2019 Springer Nature Limited. All rights reserved. 


by Chu etal. could improve our understanding 
of gut—brain communication. 

Such research could also reveal a route to 
translating the current findings into clinical 
advances. The potential applications are 
wide-ranging, because alterations in cogni- 
tion and synaptic plasticity are seen in nearly 
all neuropsychiatric disorders. Perhaps most 
germane to the current study would be the 
treatment of post-traumatic stress disorder, 
inwhich people cannot extinguish memories of 
frightening or traumatic experiences. Chu and 
colleagues’ work raises the possibility of target- 
ing the gut microbiota and its metabolites as 
a strategy for helping such individuals. Much 
remains to be done, but this study is an impor- 
tant step in our mechanistic understanding of 
the gut-brain axis. 
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Astrochemistry 


The origins of 


buckyballsin space 


Alessandra Candian 


The spectroscopic fingerprints of buckyballs have been 
observed in space, but questions remain about how these 
large molecules form. Laboratory experiments have revealed 


a possible mechanism. 


A long-standing mystery in astronomical 
spectroscopy concerns diffuse interstellar 
bands, a family of absorption features seen in 
the spectra of the interstellar medium of the 
Milky Way and of other galaxies. First observed 
almost 100 years ago, the origin of any of the 
bands was unknown until 2015, when four of 
them were assigned’ to the cation of buck- 
ministerfullerene (C,,°; the uncharged mol- 
ecule is often referred to simply as fullerene, 
or colloquially as a buckyball). Fullerene and 
its analogue, C,, are by far the biggest mol- 
ecules detected in space, raising the question 
of how such large species can form in those 
rarified conditions. Researchers have sug- 
gested that fullerene forms in the outflows of 
old, carbon-rich stars known as asymptotic 
giant branch stars — the temperatures and 
densities of these outflows promote chemi- 
stry similar to that of combustion. This could 
lead to the formation of soot, whichcan contain 
fullerene-like structures. Writing in Astrophysi- 
calfournal Letters, Bernal et al.? propose a very 


Silicon 
carbide 


different formation route for fullerene. 

The carbon atoms in fullerene are arranged 
in the shape of a football, a molecular struc- 
ture that is remarkably stable but also difficult 
to construct. Fullerene has been made in the 
laboratory in experiments designed to probe 
the chemistry that occurs in carbon-rich stars: 
carbon in the form of graphite was vaporized 
into a high-density helium flow, producing car- 
bonclusters*. The discovery that fullerene was 
among the reaction products led to the award 
of the Nobel Prize in Chemistry to Harry Kroto, 
Richard Smalley and Robert Curl in 1996. 

However, the range of temperatures required 
to create fullerene in this way is quite specific’; 
outside that range, molecules known as poly- 
cyclic aromatic hydrocarbons (PAHs) are 
produced instead. These molecules are 2D sec- 
tions of a single layer of graphite (a graphene 
sheet), decorated with hydrogen atoms. Sub- 
sequent experiments” have shown that PAHs 
that contain more than 60 carbon atoms are 
converted into fullerenes when exposed to 


Figure 1 | Evidence of a mechanism for the formation of buckminsterfullerene in space. Bernal et al.? 
heated grains of silicon carbide (SiC) and bombarded them with ions, mimicking the conditions experienced 
by the dust around old stars. Using a transmission electron microscope, the scientists observed that the outer 
layers of SiC had transformed into graphene sheets, as shown in this idealized grain. They also observed 

the formation of hemispherical structures with diameters similar to that of buckminsterfullerene (C,,) on 

the surface of the grains. Their work thus reveals a convincing process through which C,, could form in the 


outflows of old stars. 
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sufficient ultraviolet irradiation. 

The first astronomical source in which 
fullerene was detected was the star Tc 1 (ref. 7). 
Puzzlingly, however, the emission associated 
with fullerene came from a location far away 
from the star and its ultraviolet photons, 
whereas the PAH emissions were closer to the 
star. On the basis of the previously reported 
laboratory experiments, this is the opposite 
of what should happenif fullerene forms from 
PAHs in this source®. So how can the locations 
of the emissions be explained? 

Bernal and co-workers now report that 
fullerene also forms readily from silicon car- 
bide (SiC), which has been proposed to be the 
first carbonaceous material to condense out 
of old, carbon-rich stars’. The authors rapidly 
heated grains of the crystalline form of SiC that 
is found in highest abundance in meteorites”, 
and irradiated them with xenon ions, mimick- 
ing the heating caused by shock waves around 
old stars. 

Using a transmission electron microscope to 
image the surfaces of the samples downto the 
subnanometre scale, the scientists observed 
that the grain material had altered notably as 
aresult of its treatment (Fig. 1). Silicon atoms 
had percolated to the outer layers of the 
grains, leaving behind what looked like sheets 
of carbon atoms in a hexagonal ‘chicken-wire’ 
arrangement — that is, graphene sheets. 

The transformation of the outer layers of 
SiCinto graphene sheets at high temperatures 
had been reported" previously for a different 
form of SiC from that studied by Bernal and col- 
leagues. However, Bernal et al. also observed 
the formation of hemispherical structures 
with diameters similar to that of fullerene. 
Their work thus provides a convincing new 
mechanism for the formation of fullerene in 
evolved stars. 

Bernal etal. report another piece of evidence 
supporting the idea that SiC grains are rapidly 
heated and bombarded with ions in evolved 
stars. They have identified a fragment of 
the Murchison meteorite — a highly studied 
meteorite that is rich in organic compounds 
— in which the ratio of carbon-12 to carbon-13 
isotopes is typical of material from an old, car- 
bon-rich star. This indicates that the fragment 
was not produced during or after the forma- 
tion of the meteorite, but instead is stardust 
that originated in an old star. The fragment 
has a core of SiC surrounded by graphene 
sheets. However, previous analyses” of 
graphite-containing stardust found evidence 
only of titanium carbide cores, rather than 
SiC cores. This raises the question of how 
common SiC cores are in graphite-containing 
stardust. 

The rapid heating of SiC grains in the 
presence of hydrogen can lead to the forma- 
tion of PAHs”. Bernal and colleagues’ findings 
therefore suggest that the thermal conversion 
of SiC to graphene sheets in evolved stars 


could be the first step in the formation of large 
carbon-containing molecules in general: sub- 
sequent (or simultaneous) exposure of the 
graphene to atomic hydrogen produces PAHs, 
whereas ion bombardment produces fuller- 
ene. Alternatively, PAH molecules might be 
molecular intermediates in the formation of 
carbon soot, which can then be broken down 
by ultraviolet irradiation to make PAHs again". 

The efficiency of Bernal and colleagues’ 
fullerene-forming mechanism is unknown, 
raising the question of how many SiC grains 
are needed to account for the observed abun- 
dance of fullerene molecules in space. If there 
aren't enough grains, thena further mechanism 
will be required to explain the abundance of 
fullerene. By contrast, if there are too many 
SiC grains, what happens to the ‘excess’ fuller- 
ene molecules produced, given that they are 
notoriously difficult to degrade? More experi- 
ments and detailed modelling of the formation 
of fullerene and of other carbon-containing 
large molecules from SiC grains are needed to 
understand this process, and to quantify its 
importance in old stars. 

The launch of the James Webb Space 
Telescope in 2021 will provide powerful new 
tools for studying old stars, among other 
astronomical objects. Observations of fuller- 
ene-containing sources”® such as Tc 1 will be 
able to constrain the regions in which SiC 
grains, fullerene and PAHs are present, provid- 
ing more clues about how large molecules are 
actually formed. Further analysis and model- 
ling of the routes involved will eventually allow 
astronomers to suggest the identities of the 
other mysterious molecules responsible for 
the diffuse interstellar bands. 
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Optical physics 


Light trapping gets a boost 


Kirill Koshelev & Yuri Kivshar 


The ability of structures called optical resonators to trap light 
is often limited by scattering of light off fabrication defects. A 
physical mechanism that suppresses this scattering has been 
reported that could lead to improved optical devices. See p.501 


Devices called optical resonators confine light, 
but for only alimited time because of unavoida- 
ble light emission. On page 501, Jin etal.’ report 
that such emission can be greatly reduced by 
using the interference of light waves known as 
bound states in the continuum. Such waves are 
akin to exotic electron waves that were intro- 
duced in the theory of quantum mechanics 
almost a century ago’. The authors’ finding 
could have many technological implications 
for nanophotonics, quantum optics and non- 
linear optics — the study of how intense light 
interacts with matter. 

Interference is acommon wave phenomenon 
in physics, whereby two or more waves pass 
through one another to produce a combined 
waveform. Consider the case in which these 
waves are correlated with one another, either 
because they come from the same source or 
because they have almost the same frequency. 
Ifthe crest of one wave coincides with the crest 
of another wave, the combined amplitude will 
be the sum of the individual amplitudes. And 
if the crest of one wave meets the trough of 
another wave, the combined amplitude will 
be the difference in the individual amplitudes. 
These two scenarios are called constructive and 
destructive interference, respectively. 

The effects of interference can be observed 
for all waves, but interference associated with 
bound states in the continuum (BICs) has 
attracted much attention in photonics over 
the past few years*. BICs are formed by the 
destructive interference of several ordinary 
light waves that have a similar wavevector — a 
quantity that describes a wave’s velocity and 
direction of propagation. This interference pro- 
vides ameans of achieving strong confinement 
of light and of increasing its amplitude through 
a phenomenon knownas optical resonance. It 
can also be used to tune an optical resonator 
into the ‘supercavity’ regime, in which emis- 
sion of light from the resonator is restrained’. 
Several approaches to realizing BICs have been 
suggested for waves in electronic, electromag- 
netic and acoustic systems. 

TheconceptofBICs was proposed for unusual 
states of electron waves by two pioneers of 
quantum mechanics, John von Neumann and 
Eugene Wigner’. They discovered that specific 
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Figure 1 | Increasing the quality factor of an 
optical resonator. Jin et al.’ report simulations of 
and experiments on a light-trapping device known 
as an optical resonator. The key characteristic of a 
resonator is the quality factor — a measure of the 
efficiency of light trapping. This quantity varies 
with the wavevector, which describes the velocity 
and propagation direction of a wave. The authors 
used their resonator to trap light in the form of 
waves called bound states in the continuum (BICs). 
They then combined these BICs into a single state: 
a merging BIC. As this graph shows, a merging BIC 
increases the quality factors of all waves that have 
similar wavevectors to it. 


potentials (potential-energy profiles) could 
support spatially localized electron states 
that have energies larger than the maximum 
energy of the potential. In other words, the 
states could be confined even though their 
energies would normally allowthem to escape. 
In photonics, a light wave that is trapped by 
an optical resonator can be converted toa BIC 
under certain conditions? — a discovery that 
was made only in 2008. 

The main characteristic of an optical 
resonator is the quality factor — the ratio of 
the time over which the device cantrap light to 
the period of the wave’s oscillation. If the light 
waves destructively interfere to form BICs, the 
quality factor greatly increases. Moreover, in 
the BIC regime, the quality factor theoreti- 
cally tends to infinity when one of the system 
parameters, such as the size of the resonator, 
is tuned. By contrast, the quality factor of a 
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conventional resonance is not substantially 
affected by parameter variations. 

In practical optical resonators, the quality 
factors of BICs are fundamentally limited by 
inevitable fabrication defects, which scatter 
light out of the plane of the device. Any light 
wave that is scattered off a structural imper- 
fection changes its wavevector. To prevent 
scattering losses, waves must remain trapped 
in the resonator even after these changes 
have occurred. In other words, the quality 
factor needs to be high both before and after 
scattering. 

Jin and colleagues have suggested and 
demonstrated an innovative physical mech- 
anism for achieving optical resonances that 
are extremely robust to out-of-plane scatter- 
ing. They considered a structure called a 
photonic crystal slab, consisting of asubmicro- 
metre-thick dielectric (electrically insulating) 
membrane patterned with a square lattice of 
circular holes. 

The authors first ran numerical simulations 
to study the optical resonances in their mem- 
brane. By carefully selecting the membrane’s 
parameters, they achieved several simulated 
BICs that had different wavevectors. They then 
altered the periodicity of the lattice until the 
BICs had the same wavevector. This gave rise 
to anew type of optical resonance: a merging 
BIC (which one might refer to as a super-BIC; 
Fig. 1). The hallmark of a merging BIC is that it 
increases the quality factor of all waves that 
have nearly the same wavevector as the res- 
onance, reducing scattering losses from the 
resonator. 

Jin et al. then experimentally demonstrated 
their mechanism by fabricating a set of silicon 
membranes that had different lattice periodic- 
ities. Some of these membranes supported a 
merging BIC at telecommunication wavelengths 
(about 1,550 nanometres) and others were close 
tothis merging-BIC regime. The authors useda 
tunable telecommunication-wavelength laser 
to measure the intensity of scattered light along 
different directions for each of the samples. 
They found that the membranes supporting a 
merging BIC hada quality factor that was about 
10 times larger than that for the membranes not 
in the merging-BIC regime. Moreover, they 
showed that the observed increase in quality 
factor was robust by finding a similar level of 
enhancement in all of the fabricated samples 
that had a merging-BIC design. 

The demonstration could have many 
consequences for engineering high-quality 
resonances in nanophotonics. The ability to 
convert light waves into BICs allows the realiza- 
tion of the supercavity regime, in which highly 
compact resonators can have extremely large 
quality factors®. Dielectric materials that have 
high refractive indices could be used to reduce 
the resonator dimensions and to combine indi- 
vidual BIC resonators that have high-quality 
resonances into structured arrays®. 
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We predict that an electromagnetic theory 
will be developed for describing high-quality 
resonances in individual dielectric nano- 
particles of high refractive index and arrays 
of such nanoparticles, and that they all will be 
expressed in terms of the mathematics used 
to study interference in quantum mechanics. 
In the real world, the engineering of quality 
factors inthe BIC regime could lead to substan- 
tial enhancement of nonlinear and quantum 
effects, the development of lasers that consume 
little power, and the realization of nanoscale 
resonators that facilitate strong confinement 
of light and large boosts to its amplitude. 
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Lactate links metabolism 


togenes 


Luke T. Izzo & Kathryn E. Wellen 


Cells regulate gene expression in part through the chemical 
labelling of histone proteins. Discovery of a label derived 
from lactate molecules reveals a way in which cells link gene 
expression to nutrient metabolism. See p.575 


Cellular metabolism involves the uptake, 
release and biochemical interconversion of 
nutrients to produce energy and synthesize 
complex molecules. The intermediates and 
end products of metabolism also have essential 
signalling functions, modulating cell signal- 
ling and gene expression in accordance with 
nutritional resources”. One way in which 
these metabolites signal is through the chem- 
ical modification of proteins suchas histones. 
On page 575, Zhang and colleagues? describe 
their discovery of a previously unknown his- 
tone modification, lactylation, derived from 
the cellular metabolite lactate. 

Histones are central components of chro- 
matin — acomplex of DNA and proteins that 
organizes and regulates the genome. They 
can bealtered by cellular enzymes, which add 
chemical tags suchas methyl, acetyl and phos- 
phate groups; these epigenetic modifications 
to the genome affect processes such as gene 
expression and DNA replication and repair. 
Zhang etal. predicted that histones might also 
be altered by the addition of lactyl groups, and 
they began their search for lactylation by using 
a technique called mass spectrometry, which 
has enabled the identification of numerous 
protein modifications in the past few years*. By 
looking for shifts in the masses of amino-acid 
residues that make up histone tails, the authors 
deduced the presence of a modified lysine ami- 
no-acid residue, consistent with the addition 
of a lactyl group. Zhang et al. validated this 
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finding by comparing synthetic peptides that 
had been chemically modified inthis way with 
the corresponding peptides identified in cells. 

The authors also used metabolic tracing with 
aform of lactate labelled witha stable isotope 
of carbon (2C,-lactate) to demonstrate that lac- 
tate is involved in histone lactylation. They fur- 
ther found that levels of lysine lactylation rose 
when cells were treated with increasing doses 
of lactate. So, histone lactylation is derived 
from lactate and is sensitive to lactate levels. 

Lactate is an abundant metabolite produced 
during glycolysis — acentral metabolic process 
in which glucose consumed by cells is broken 
down to generate energy. During glycolysis, 
glucose is converted into two pyruvate mol- 
ecules; these can be either funnelled into lac- 
tate production or transported into the cellular 
power generators (the mitochondria), forming 
the intermediate acetyl coenzyme A (acetyl- 
CoA) and thence entering the Krebs cycle 
for energy production. Lactate is produced 
through glycolysis in various cell types, includ- 
ing cancer cells and immune cells. Its produc- 
tionis also enhanced under certain conditions, 
suchas hypoxia (low oxygen levels), which sup- 
presses pyruvate entry into the Krebs cycle. 
Zhang and colleagues’ discovery that lactate 
is used for histone modification is intriguing 
both because of the metabolite’s abundance 
and because its production, uptake and use are 
all subject to dynamic regulation’. 

One substantial question that the authors 


aimed to address is whether lysine lactylation 
responds to metabolic alterations in cells. 
Other metabolite-derived protein modifica- 
tions — suchas lysine acetylation (derived from 
acetyl-CoA) — are metabolically sensitive’, pro- 
viding a precedent for this idea. Zhang et al. 
found that the amount of glucose available 
to cells grown in vitro dynamically regulates 
the lysine lactylation of histones in those cells. 
Furthermore, tracing of isotopically labelled 
glucose ("C,-glucose) showed that lysine lac- 
tylation depends on glycolysis. The authors 
used several perturbations to promote lactate 
production (including hypoxia and inhibitors 
of mitochondrial metabolism) and to suppress 
it (using inhibitors of pyruvate conversion to 
lactate). The cumulative data indicate that 
lysine lactylation is highly sensitive to lactate 
production through glycolysis. 

Zhang et al. next sought to investigate the 
biological functions of lactylation, selecting 
macrophages as their model. Macrophages are 
immune cells that can take on pro-inflammatory 
(termed M1) or anti-inflammatory (M2) charac- 
teristics; they undergo metabolic changes that 
correspond to these functions*. For example, 
macrophages that encounter signs of bacte- 
rial infection activate inflammatory genes and 
upregulate glycolysis°. Zhang et al. stimulated 
macrophages with bacteria or with the bacterial 
component lipopolysaccharide (LPS) to induce 
Mi characteristics. They found that glycolysis 
increased and that intracellular levels of lactate 
rose progressively, parallelling an increase in 
histone lactylation (Fig. 1a). Notably, inflamma- 
tory genes typically associated with M1 charac- 
teristics were upregulated rapidly on exposure 
to LPS, but did not correlate with lactate lev- 
els or lysine lactylation (Fig. 1b). Instead, the 
increase in lysine lactylation was slower and 
correlated with the upregulation of homeo- 
static genes (those involved in maintaining a 
biological steady state). 

The authors went on to investigate where 
lysine lactylation occurred in the M1 genome, 
as well as how the modification altered gene 
expression. They found that lysine lactylation 
was high in gene promoter regions (which 
mark the start points of gene transcription), 
and associated positively with the levels of 
messenger RNA produced from those genes. 
The authors also compared lysine lactylation 
and acetylation, finding lactylation at many 
genes that lack acetylation — suggesting dis- 
tinct roles for the two modifications. Moreover, 
macrophages that could not produce lactate 
could increase the expression of inflammatory 
genes in response to stimulation with LPS, but 
could not upregulate lysine lactylation or the 
associated expression of homeostatic genes at 
later times. These temporal dynamics led the 
authors to propose that a delayed ‘lactate timer’ 
involving histone lactylation drives the activa- 
tion of genes involved in resolving infectionsto 
help re-establish tissue homeostasis (Fig. 1b). 
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Figure 1| Anew epigenetic modification, histone lactylation. Zhang and colleagues’ have discovered a 
chemical modification called lactylation — the addition of a lactyl (La) group to the lysine amino-acid residues 
in the tails of histone proteins. a, Stimulating immune macrophage cells with lipopolysaccharide molecules 
(mimicking bacterial infection) increases the conversion of glucose to energy through glycolysis. This, in turn, 
leads to increases in intracellular levels of the molecule lactate, and to lactylation of histones at promoter 

DNA sequences. However, it is unclear which enzymes generate the intermediate molecule lactyl-CoA, from 
which La is derived, or which enzymes deposit (writers), remove (erasers) or recognize and interpret (readers) 
histone lactylation. b, Zhang et al. report that the increase in lysine lactylation is delayed following macrophage 
stimulation. This delay correlates with changes in the expression of homeostatic genes involved in maintaining 
a biological steady state, but not with changes in inflammatory-gene expression. The authors therefore 
hypothesize that lactylation generates a ‘lactate timer’ to restore normal tissue function after infection. 


These findings raise questions about the bio- 
chemistry of lactylation and its broader rolesin 
physiology and disease. Interms of biochemis- 
try, the authors showin a cell-free system that 
lactyl-CoA is alactyl-group donor for lysine lac- 
tylation. So far, however, the enzymes that pro- 
duce lactyl-CoA from lactate in the cell, as well 
as the cellular concentrations of lactyl-CoA, are 
unknown. Other unresolved questions concern 
the way in which lysine lactylation is regulated 


“The findings raise questions 
about the biochemistry of 
lactylation and its broader 
rolesin physiology and 
disease.’ 


by the enzymes that deposit, read or remove 
this label. In the authors’ cell-free system, an 
acetyltransferase enzyme known as p300 can 
catalyse the transfer of lactyl from lactyl-CoA 
to histones, but whether it does this in cells has 
yet to be tested. 

In terms of the broader roles of this modi- 
fication, lactate is generated by cells both in 
physiological contexts — such as in skeletal 
muscle during exercise — and in the context 
of diseases suchas cancer. In addition, lactate is 
taken up by cells of healthy tissues and tumours 
to feed the Krebs cycle’. So, as well as occur- 
ring in glycolytic cells, lactylation might also 
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participate in communication between cells. 
On this note, high lactate levels in the environ- 
ment around tumours are known to promote 
immunosuppression®, and Zhang et al. found 
that histone lysine lactylation was greater in 
tumour-associated macrophages than in those 
from another tissue. Allinall, the authors’ dis- 
covery of histone lactylation provides alaunch 
point for a deeper investigation into the roles 
and regulation of this modification, which 
links cellular metabolism to gene regulation 
and could have numerous implications for 
human health. 
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Early tetrapods had 
an eye onthe land 


Nadia B. Frébisch & Florian Witzmann 


Fossil finds that can provide clues about how aquatic 
vertebrates evolved into land dwellers are elusive. But the 
ancient bones of anewly discovered species of tetrapod now 
provide some crucial missing evidence. See p.527 


Following the scientific investigations into how 
vertebrates transitioned from water to land is 
like reading a good crime novel. We have arange 
of suspects, patchy evidence and a lot of unan- 
swered questions. And to complicate matters, 
this transition from finned fish to four-limbed 
creatures (tetrapods) is a ‘cold case’ from nearly 
400 million years ago. On page 527, Beznosov 
etal.’ present some compelling detective work 
that sheds light on this. 

The earliest-known tetrapod specimens 
are 380-million-year-old bone fragments 
that, although identifiable as belonging toa 
tetrapod, do not provide many details about 
what these animals looked like or how they 
lived”. There are also fossilized tetrapod 
footprints that pre-date these fossil finds 
by more than 14 million years’, indicating 
the presence of a four-limbed, still fully 
aquatic track maker — but they do not reveal 
what the track maker looked like above the 
soles of its feet. 

More-detailed insights into the body 


shape, life and growth of our early vertebrate 
ancestors are provided by more-complete 
fossil finds, including the iconic tetrapods 
Acanthostega and Ichthyostega”*. However, 
these lived 365 million years ago, when 
tetrapods had already achieved animpressive 
geographical distribution and a diverse variety 


“This water dweller was 
looking above the surface 
of the water.” 


of body shapes and ways of life”. 

By contrast, the earliest phase of tetrapod 
evolution and diversification has long been 
mysterious. However, Beznosov and colleagues 
now describe skeletal fossils of a species they 
call Parmastega aelidae, which is the most 
ancestral (basal-most) tetrapod reported so far. 

Like its known younger relatives, P. aelidae 
was a gill-breathing water dweller, and the 


authors estimate that this animal reached 
a size of more than one metre long. It lived 
about 372 million years ago during the Devo- 
nian period, and inhabited a shallow lagoonin 
a landmass that is now part of northwestern 
Russia. These excellently preserved fossils pro- 
vide crucial data about how the major changes 
in breathing, sensory perception, locomotion 
and feeding might have taken place as tetra- 
pods transitioned to life on land. The discovery 
also raises many exciting questions. 

The most striking features of the P. aelidae 
skull are the large, oval-shaped eye openings, 
which face to the front and side, and which 
are positioned high up, towards the top of 
the skull (Fig. 1). This eye shape and position 
is surprising because it indicates that this 
water dweller was looking above the surface 
of the water. 

Mudskippers (species from the family 
Oxudercidae) are modern amphibious fish 
that inhabit marine mud flats, and they are 
useful living creatures with which to compare 
P. aelidae because their eyes have a similar 
shape and position. Mudskippers peek above 
the water surface to look out for prey and 
potential danger®. But what was P. aelidae look- 
ing for? The need to detect enemies on land or 
inthe air can be ruled out, because during the 
late Devonian period, such animals were not 
yet present there. 

One possibility is that P. aelidae was looking 
for prey onthe shore. If so, what kind of terres- 
trial or semi-terrestrial prey was it watching? 
Some have suggested that early water-dwelling 
tetrapods and their closest fish-like relatives 
might have preyed on terrestrial invertebrates 
of the phylum Arthropoda, which includes 
insects®. However, the large arthropods that 
could have provided sufficient food to sustain 
an animal the size of P. aelidae were still rare 


Spiracle 


Parmastega aelidae Colosteids 


Figure 1| The evolution of tetrapod skulls. Beznosov et al.' report 372-million- 


year-old fossils of a four-limbed vertebrate (tetrapod) from just before the 

time when tetrapods moved onto land. They call this newly discovered species 
Parmastega aelidae. Its nasal passages (nares) are close to its jaw and would 
have been positioned under water. Water passing though the nares (blue arrow) 
would have been used for breathing when it reached the gills (not shown). 

P. aelidae could also breathe air directly through a skull opening called a spiracle 
(grey). Comparing these ancestral features of P. aelidae with other tetrapods 
reveals patterns of evolutionary change. The other tetrapods shown are: an 
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Seymouriamorphs 


Embolomeres Temnospondyls 


early tetrapod group called colosteids; seymouriamorphs and embolomeres, 
members of a lineage that gave rise to amniotes (birds, reptiles and mammals); 
and temnospondyls, which gave rise to modern amphibians (such as frogs 

and salamanders). Colosteids lacked spiracles and breathed solely through 

their gills using water taken up through the nares. Compared with P. aelidae, 
seymouriamorphs and temnospondyls had larger and higher nares, which they 
would have used to breathe air (red arrow). These tetrapods lacked spiracles, and 
had ears (yellow) instead in that area of their skull. Embolomeres retained the 
breathing system used by P. aelidae. 
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NADIA FROBISCH & FLORIAN WITZMANN 


in the Devonian period’. Moreover, P. aelidae 
had large fangs, which suggests that it preyed 
mainly on vertebrates. Perhaps it searched 
for fish carcasses stranded on the shore. Or, 
to make an even more speculative suggestion, 
maybe it scavenged early amphibious tetra- 
pods that rested near the water. However, 
evidence for such creatures has not yet been 
found among the fossils of the Sosnogorsk 
Formation (the rock layers that contained the 
P.aelidae fossil). 

Another notable feature of P. aelidae is the 
extremely low position, close to its jaws, of 
the external openings of its nose (the nares), 
which would have been under water (Fig. 1). 
This is in striking contrast to the high posi- 
tion of its eyes and is quite different from 
the configuration of nares in modern-day 
aquatic tetrapod animals, such as croco- 
diles, hippopotamuses or frogs. The eyes of 
those animals sit on top of their head, and 
their nares are likewise positioned high on 
the snout, which enables them to breathe air 
while looking above water. Judging from their 
submerged position, P. aelidae nares acted as 
openings through which an inflow of water 
was directed towards the gills during breath- 
ing. P. aelidae also had the option of breathing 
air through a large opening in its skull called 
a spiracle (Fig. 1), and such a breathing pro- 
cess would probably have been similar to that 


used by modern air-breathing fish’. 

This low position of the nares is found in 
most known early tetrapods (called stem 
tetrapods) of the Devonian period (approxi- 
mately 419.2 million to 358.9 million years ago) 
and Carboniferous period (358.9 million to 
298.9 million years ago). In all of these animals, 
the passage from the nares to the mouth cav- 
ity might still have served to transport water 
rather than air. Some fossils of stem tetrapods, 
such as those of a grouping called colosteids 
(Fig. 1), had lost their spiracle opening — they 
must therefore have relied on gill breathing. 
In some other early tetrapods that arose later 
than P. aelidae and were more evolved than 
their ancestors (a state described as being 
more derived), the spiracle is absent, and its 
place is taken by an ear’. These tetrapods’ nares 
are larger and higher onthe snout (Fig. 1) com- 
pared with the ancestral form, suggesting that 
they used their nares to transport air towards 
the lungs while peeking out of the water when 
onthe lookout for prey. 

The P. aelidae fossils offer a treasure trove 
of information that could help to disentangle 
some of the complex evolutionary changes that 
took place when vertebrates made the transi- 
tion from aquatic to terrestrial life. This discov- 
ery also reminds us that much still remains to 
be learnt in the next gripping chapter of this 
detective story. 
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Half of all of the elements in the Universe that are heavier than iron were created by 
rapid neutron capture. The theory underlying this astrophysical r-process was worked 
out six decades ago, and requires an enormous neutron flux to make the bulk of the 
elements’. Where this happens is still debated”. A key piece of evidence would be the 
discovery of freshly synthesized r-process elements in an astrophysical site. Existing 
models? * and circumstantial evidence’ point to neutron-star mergers as a probable 
r-process site; the optical/infrared transient known as a ‘kilonova’ that emerges in the 
days after a merger is a likely place to detect the spectral signatures of newly created 
neutron-capture elements’ ®. The kilonova AT2017gfo—which was found following the 
discovery of the neutron-star merger GW170817 by gravitational-wave detectors'°—was 
the first kilonova for which detailed spectra were recorded. When these spectra were 
first reported”, it was argued that they were broadly consistent with an outflow of 
radioactive heavy elements; however, there was no robust identification of any one 
element. Here we report the identification of the neutron-capture element strontium in 
areanalysis of these spectra. The detection of a neutron-capture element associated 
with the collision of two extreme-density stars establishes the origin of r-process 
elements in neutron-star mergers, and shows that neutron stars are made of neutron- 


rich matter”. 


The most detailed information yet available for a kilonova comes froma 
series of spectra of AT2017gfo taken over several weeks with the medium- 
resolution, ultraviolet (320 nm) to near-infrared (2,480 nm) spectro- 
graph X-shooter, mounted at the Very Large Telescope at the European 
Southern Observatory. These spectra”” allow us to track the evolution 
of the kilonova’s primary electromagnetic output from 1.5 days until 
10 days after the event. Detailed modelling of these spectra has yet to 
be done, owing to limited understanding of the phenomenon and the 
expectation that a very large number of moderate to weak lanthanide 
lines with unknown oscillator strengths would dominate the spectra”. 
Despite this expected complexity we sought to identify individual ele- 
ments in the early spectra, because these spectra are well reproduced 
by relatively simple models”. 

The first-epoch spectrum can be reproduced over the entire observed 
spectral range by using a single-temperature blackbody with an observed 
temperature of approximately 4,800 K. The two major deviations short 


of 1m from a pure blackbody are due to two very broad absorption 
components (with widths of roughly 0.2c, where cis the speed of light). 
These components are centred at about 350 nm and 810 nm (Fig. 1). The 
shape of the ultraviolet absorption component is not well constrained 
because it lies close to the edge of our sensitivity limit and may simply 
be cut off below about 350 nm. The presence of the absorption feature 
at 810 nm in this epoch has been noted previously". 

The fact that the spectrum is very well reproduced by a single-tem- 
perature blackbody in the first epoch suggests a population of states 
close to local thermal equilibrium (LTE). We therefore use three separate 
methods of increasing complexity first to determine, without too many 
assumptions, the most likely origin of the spectral features, and then to 
self-consistently model and test our conclusion. These three methods 
are: first, our own LTE spectral-synthesis code; second, the LTE line- 
analysis and spectrum-synthesis code MOOG”; and third, the moving- 
plasma radiative-transfer code TARDIS” (see Methods). We use a variety 
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4Zentrum fiir Astronomie der Universitat Heidelberg, Astronomisches Rechen-Institut, Heidelberg, Germany. DTU Space, National Space Institute, Technical University of Denmark, Kongens 
Lyngby, Denmark. “Institut fur Kernphysik, Technische Universitat Darmstadt, Darmstadt, Germany. ’GSI Helmholtzzentrum fuir Schwerionenforschung GmbH, Darmstadt, Germany. 
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Fig. 1|Spectrum of the kilonova AT2017gfo, showing broad absorption 
features. The spectrum shown was taken with the spectrograph X-shooter 1.5 
days after the neutron-star merger GW170817. The dashed black line in the upper 
panel is the blackbody component of a blackbody model with broad absorption 
lines (see main text). The residuals of data minus blackbody are shown inthe 
lower panel, with the dashed grey line indicating the louncertainty on each 
spectral bin. The data in the sections overplotted with grey bars are affected by 
telluric features or are poorly calibrated regions and are not included in the fit. 
F, is the flux per unit wavelength. 


of spectral-line lists for these codes, all of which yield consistent results. 
For our ownLTE code, we adopt a fiducial temperature of 3,700 K, which 
is our final model's best-fit temperature corrected by the Doppler factor 
(-0.23) of the absorption features that we determine below; changing 
the temperature of our LTE model in the range 3,700-5,100 K does not 
markedly affect our results. 

To identify the absorption features, we seek lines with wavelengths 
blueshifted by 0.1-0.3c, corresponding approximately to 390-500 nm 
and 900-1,160 nm in the rest frame (see Methods). The lines will also 
be broadened with an observed width that depends on the velocity 
and geometry. For spherically expanding ejecta, the line broadening 
will be similar to the expansion velocity of the gas. We do not attempt 
a detailed geometric model here because it depends on assumptions 
about the geometry of the gas and the wavelength-dependent opacity, 
with substantial relativistic and time-delay corrections. 

We adopt an initially agnostic view of the expected abundances. We 
use solar r-process abundance ratios (the total solar abundances of heavy 
elements with s-process elements subtracted”), as well as abundances 
from two metal-poor stars that are old enough to be dominated by the 
r-process in their neutron-capture abundances”. These three sets 
span a wide range in their ratios of light-to-heavy r-process abundances 
(Fig. 2). We also produce absorption spectra for each element individu- 
ally (Extended Data Figs. 1, 2). 

Our LTE models using abundances froma solar-scaled r-process and 
metal-poor stars all show that Sr produces a strong feature centred 
at an observed wavelength of roughly 800 nm, as well as features at 
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wavelengths shorter than around 400 nm, for our adopted blueshift 
(Fig. 3; see also Extended Data Fig. 3). The restframe wavelengths of the 
longer-wavelength features are 1,000-1,100 nm. It is worth noting that 
Sr is typically considered an s-process element because only about 30% 
of the cosmic (solar) abundance is produced by the r-process®”. For 
this reason it has not always been considered in kilonova simulations. 
However, it is one of the more abundant r-process elements, account- 
ing for at least a few per cent by mass of all such elements”. Of all of the 
r-process elements, Sr displays by far the strongest absorption features 
in this region of the spectrum (Extended Data Figs. 2, 3). Ba produces 
strong absorption, as do the lanthanide elements, but only in the optical 
region at wavelengths shorter than about 650 nm. The spectral features 
that we observe can therefore only be due to Sr, an element produced 
near the first r-process peak. 

The 810-nm feature was previously proposed” to originate in absorp- 
tion from Cs1and Tel. This identification can now be ruled out, because 
neither Cs Inor TeI produces strong lines ina plasma at this temperature 
(Extended Data Fig. 3). Much stronger lines would be expected fromthe 
ions of other elements that are co-produced with Cs (atomic number 
Z=55) (for example, the lines from La 11; see Methods). 

The most abundant r-process elements are those in the first peak 
(Fig. 2)—elements with mass numbers (A) of around 80—and of these, 
itis Sr, Y (Z=39) and Zr (Z=40) that are easily detected inalow-density, 
roughly 4,000-K thermal plasma, because these elements have low exci- 
tation potentials for their singly charged ions. Seen in this context, the 
detection of Srin AT2017¢fo is not surprising, despite prior expectations 
that the spectra would be dominated by heavier elements". Further- 
more, the atomic levels in Sr that give the absorption lines observed at 
810 nm are metastable. Photo-excitation can increase the population 
in these states, strengthening the 810-nm feature markedly” compared 
with the resonance blue/near-ultraviolet absorption lines. Ba and the lan- 
thanide series contribute substantially to the total opacity of r-process 
material in the optical region of the spectrum (Fig. 2), yet we do not 
detect strong optical features. We cannot on this basis, however, easily 
exclude the presence of elements with mass numbers of more than 140 
or so. Even if we could exclude the presence of heavier elements in the 
outer layers of the thermal, expanding cloud, there is no way from these 
early spectra of excluding the possibility that such elements could exist 
at lower depths or in an obscured component. 

Given that a simple r-process abundance LTE model can account 
well for the first-epoch spectrum, we expand it to the subsequent three 
epochs, while the kilonova is still at least partially blackbody like. With 
a freely expanding explosion we expect to begin observing P Cygni 
lines once the outer absorbing ‘atmosphere’ begins to become more 
optically thin and attain a substantial physical radius with respect to 
the photospheric radius. We fit the first four epochs as a blackbody 
with P Cygni lines from Sr. We fit only the strongest lines in order to 
reduce our computational time to a manageable level, as these lines 
provide most of the opacity at these wavelengths. These fits are shown 
in Fig. 4 and offer a compelling reproduction of the spectra at all three 
epochs. The P Cygni model has free parameters for the velocities of the 
photosphere and atmosphere, which change the shape of the profile. 
The fit is remarkable given its simplicity and our lack of knowledge of 
the system geometry. We note that P Cygni emission components are 
always centred close to the rest wavelength of the spectral lines, so 
the observed wavelength of the emission line is not a free parameter. 
The most prominent emission component observed throughout the 
spectral series is centred close to 1,050 nm, and the weighted restframe 
centre of the near-infrared lines from Sr is also 1,050 nm. This adds to 
our confidence in the line identification based on the simple thermal 
r-process absorption model. 

We further confirm our results using TARDIS (‘temperature and radia- 
tive diffusion in supernovae’), extending this code’s atomic database to 
include elements up to ,,U by using the latest Kurucz line list™ with its 
2.31 million lines. Our TARDIS models produce results very similar to 
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Fig. 2| Abundances of elements produced by ther-process. Relative r-process 
abundances (€) normalized to the Ba abundance are shown for the Sun and for 
two metal-poor stars—one, CS 22892-052, rich in heavy r-process elements”, 


our static-code models, reproducing the spectra well (Extended Data 
Fig. 6). In particular, the P Cygni emission/absorption structure is well 
reproduced as expected, confirming our LTE and MOOG modelling, 
and showing Sr dominating the features around 1 pm. 
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and the other, HD 88609, rich in light r-process elements”°. These are the 
abundances of the elements used in the inset of Fig. 3. 


Given our detection of Sr, it is clearly important to consider lighter 
r-process elements in addition to the lanthanide elements in shaping 
the kilonova emission spectrum. Observations of abundances instars in 
dwarf galaxies® suggest that large amounts of Sr are produced together 


based on the lines formed ina gas in local thermal equilibrium witha 
temperature of 3,700 K and anelectron density of 10’ cm, broadened by 0.2c 
and blueshifted by 0.23c. The spectrum produced by asolar r-process 
abundance ratio is plotted as a solid line. Contributions due to Sr (red dashed 


Fig. 3 | Thermal r-process-element transmission spectrum. These spectra are 
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line), Ba and the lanthanides (green dashed line) and the remaining r-process 
elements (blue dashed line) are also shown. Inset, spectra resulting froma solar 
r-process abundance ratio (solid line), and from the abundance ratios of the 
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Fig. 4 | Spectral series of AT2017gfo 1.5—4.5 days after the merger. Data are 
shown in grey and have been smoothed slightly. Top panel, a model (solid red 
lines) consisting of a blackbody (blue dotted lines) with P Cygni profiles (red 
transparent fill) for the Sr lines. The rest (vertical black dashed lines) and 
observed (vertical blue dashed lines) positions of the model’s Sr lines are shown, 
with the blueshift indicated by arrows. Green dotted lines show the Gaussian 
emission profiles added to ensure the overall continuum is not biased. A vertical 
offset has been applied to each spectrum for clarity, with zero flux indicated by 
the dashed horizontal line segments. Bottom panels show the residuals between 
modeland data. 


with Ba (Z= 56) in infrequent events, implying the existence of a site that 
produces both light and heavy r-process elements together in quantity, 
as found in some models™>”*. This is consistent with our spectral analysis 
of AT2017gfo and analyses of its lightcurve”””*. Together with the differ- 
ences observed inthe relative abundances of r-process Baand Srinstellar 
spectra”’, this suggests that the relative efficiencies of light and heavy 
r-process production could vary substantially from merger to merger. 

Extreme-density stars composed of neutrons were proposed shortly 
after the discovery of the neutron”, and identified with pulsars three 
decades later?°. However, no spectroscopic confirmation of the com- 
position of neutron stars has ever been made. The identification here of 
an element that could only have been synthesized so quickly under an 
extreme neutron flux provides the first direct spectroscopic evidence 
that neutron stars comprise neutron-rich matter. 
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Methods 


Spectral synthesis 

We used different codes to compute synthetic absorption spectra, 
namely MOOG”? v. 2014 and our own single-temperature and single- 
density LTE code. In addition, we verified our results using the TARDIS 
supernova spectral synthesis code. For the first two codes, we used line 
lists gathered from the literature (see Supplementary Information). For 
the TARDIS modelling, we used the line lists of Kurucz™. Our codes yield 
consistent results with the different line lists. 

MOOG is a synthetic spectrum code normally used to generate 
synthetic absorption spectra of photospheres in cool stars under the 
assumption of local thermodynamic equilibrium. It requires a model 
atmosphere that dictates how temperature, gas pressure and electron 
density behave in different layers of the surface gas. Here we adopt 
Kurucz model atmospheres™. The second requirementis a line list that 
contains the rest wavelength of the absorption transition, the element 
or ionin which the transition takes place, the excitation potential of 
the lower level, and the oscillator strength. The atomic data are based 
on refs. *'>*° with updates from the National Institute of Standards 
and Technology (NIST). The strengths of the absorption features are 
calculated solving radiative transfer equations with a plane parallel 
treatment of the atmospheres, assuming that the velocity distribution 
is Maxwellian, and that excitations and ionizations are described by the 
Boltzmann and Saha equations, respectively. The line/wing damping 
follows ascaled Uns6éld approximation and the source function follows 
a simple blackbody, while scattering (on H, He and e’) enters mainly 
through opacity terms. 

Our own code assumes only a gas in LTE without scattering, and that 
the Boltzmann and Saha equations can be used to obtain the ionization 
and excitation state of each element individually. We then use the line 
lists above and level information from NIST to determine the relative 
strengths of the lines. We adopt a fiducial electron density of logn,=7.8, 
based onthe mean density of 0.04, (where M, is the mass of the Sun) 
of singly ionized material in a sphere with the area of the best-fit black- 
body. The density of the atmosphere is almost certainly lower than this. 

Demonstrating that the MOOG models and our LTE calculations are 
reasonably comparable, for the MOOG models an effective temperature 
(T,,) at the surface of the photosphere of roughly 5,500 K anda surface 
gravity of log =0 (following the temperature and density profiles in the 
Kurucz model atmospheres) give rise to a temperature of 3,800 K and 
an electron density of n,=10’ cm? within the photosphere. Absorption 
lines from lanthanide ions are believed to be an important source of 
opacity owing to transitions with unknown oscillator strengths. For an 
LTE plasma, it is likely that such lines are important and create a complex 
continuum’, However, the lanthanide opacity is extremely high inthe 
ultraviolet and blue regions of the spectrum. The fact that we detect blue 
emission in the spectrum of AT2017¢fo is already a strong indication that 
lanthanide elements do not dominate the early-continuum spectrum, as 
suggested previously“’”, Furthermore, the infrared feature arises from 
levels that may be overpopulated owing to optical pumping, enhancing 
the strength of this feature further with respect to the line-generated 
continuum at these wavelengths. 

Synthetic spectra are generated using both codes on the basis of line 
lists containing r-process elements capable of producing strong features 
in an LTE plasma at these temperatures. We include all elements from 
33AS up tO g3Bi, as well as 4) Th and .,U. We do not include the elements 
359€, 3B, 37KF, 531 and ;,Xe as they produce no strong or moderate lines 
at these temperatures and are rarely detected in stellar spectra‘; these 
elements have first excitation energies above 5.97 eV for their neutral 
and singly charged ions, giving a fractional population less than 10° 
at our fiducial temperature. Neither do we include elements with no 
stable isotopes (,,Tc and ,,Pm), or any molecules. The absorption-line 
profiles are dominated by the velocity and density distribution of the 
expanding atmosphere. 


Our line lists contain the strongest lines for LTE spectra at these tem- 
peratures. Because we are interested in finding strong, isolated lines, 
this procedure should effectively capture all lines that could realistically 
be candidates. 


Could large numbers of weak lines dominate the opacity? 

The opacity of the kilonova is dominated by absorption lines. The list 
of lines that we use for MOOG (see references above) has most of the 
strong lines in common with the Kurucz list** that we use for the TARDIS 
modelling. The results we retrieve from the different techniques and 
line lists are a useful check on the robustness of the modelling meth- 
odologies. Both methods yield consistent results, indicating that the 
overall result presented here is robust to the selection of the specific 
line list and the modelling method chosen. We note that a feature at 
about 810 nm is also produced in the spectral synthesis analysis of 
ref. *, where lists comprising known lines are also used. This feature 
(M. Tanaka, private communication) is produced primarily by the same 
Sr I lines we identify in this work. 

The major caveat in identifying line features is the possibility that 
missing lines could have a larger influence on the broad spectral shape 
than the predicted effect from known lines. Of particular concern are 
the large numbers of unknown lines from the lanthanide elements that 
are likely to dominate the line-expansion opacity”. Although we argue 
here that our line lists are reasonably complete in strong lines at these 
temperatures and densities (and given that they are used for model- 
ling stars with similar temperatures and densities, this makes sense), 
itis possible that a very large number of weaker lines could contribute. 

However, the line-forming region of the kilonovais likely to be physi- 
cally extended, covering a substantial fraction of the kilonova radius, 
particularly in the near-infrared. The presence of a P Cygni profile at 
around 1 pm supports the idea that a substantial volume (though not 
mass) of the kilonova must be largely optically thin at this wavelength. 
The mass absorption coefficient of the Sr II lines at around 1.05 1m peaks 
at about 4 x 10? cm’ g“ for lines witha full width at half maximum (FWHM) 
of 0.01c, a temperature of 5,000 K and a density of 10°” g cm*. This is 
at least two orders of magnitude higher than the mean value obtained 
for lanthanides such as Ce and Nd in the optically thin limit using the 
Kuruczline lists. Given that the line lists for these elements are likely to 
be highly incomplete at these wavelengths, we extrapolate the value 
of the Ce line opacity of the Vienna Atomic Line Database (VALD) lines 
at 9,000 A to be roughly 1.05 jum, which should give a similar opacity 
to the line lists calculated in ref. ” with the autostructure code. When 
the lines are extremely optically thick, within the bulk of the kilonova 
inthe first days, the Ce opacity is about 10 cm’ g' (compare with ref. *”). 
Inthe optically thin regime in the outer layers, the Ce line opacity rises 
by about two orders of magnitude. Using this optically thin extrapola- 
tion of the Ce lines, the Sr 11 opacity is still a factor of four to five times 
higher, not including abundance effects that are likely to make the Sr 
line stronger still. We show an example of this effect by calculating the 
expansion opacity for a low-optical-depth plasma in Extended Data 
Fig. 5. That calculation is purely illustrative, showing how the Sr lines 
can dominate the opacity when the gas has low optical depth. For a 
self-consistent model calculation, see the TARDIS model spectra in 
Extended Data Fig. 6. 


Spectral modelling 

Inthe spectra we identify what appear to be two separate emission com- 
ponents: first, a nearly blackbody spectrum modified by absorption 
features that appears to cool over time; and second, an emission com- 
ponent at redder wavelengths that increases in strength relative to the 
first component with time. These two components do not necessarily 
arise because of discrete ejection mechanisms, but may reflect the fact 
that different parts of the spectrum probe different physical depths and 
thus physical conditions, through the wavelength-dependent expansion 
opacity®*“*, Here we focus only onthe thermal componentin the blue part 
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of the spectrum and model it as a blackbody with an extended envelope. 
We model the second component with Gaussian emission lines in order 
not to bias the overall continuum fit at shorter wavelengths, but do 
not interpret them. However, these features clearly provide important 
information on the composition of the plasma and must be addressed 
in future studies. 

The expansion velocity of the gas can be inferred from the expansion 
of the blackbody from the time of the explosion. Owing to the optical 
thickness of a blackbody, we would only be presented with the front 
face of the explosion. Consequently, pure absorption features in the 
spectrum should be blueshifted by the mean Doppler shift induced by 
the expansion speed of the gas. Conservatively, we allow 0.1-0.3c as 
the range of the blueshift"*”**, a value that depends on the details of 
the geometry of the system, and thus we restrict our search for lines 
inthe first epoch to rest wavelengths of 350 nm and 810 nm multiplied 
by 1.1-1.3. 

At the densities of the ejecta, the dominant source of opacity is 
expansion opacity”. This effect is able to establish an apparent ther- 
malization through photo-equilibration of the states”. With wavelength- 
dependent opacity, the physical depth traced at each wavelength varies. 
Because the large majority of lines are at the blue end of the spectrum, 
the expansion opacity there will be higher and, conversely, the physi- 
cal depth shallower. This causes the relative strength of ultraviolet/ 
near-infrared lines to change compared with the pure LTE transmis- 
sion values, with bluer absorption lines being less prominent relative 
to near-infrared ones. Additionally, because the population of states 
is photo-equilibrated, metastable states will be enhanced relative to 
non-metastable, as compared with LTE values”*. It is therefore impos- 
sible, primarily because of the strongly wavelength-dependent opacity, 
to use a simple comparison of LTE line strengths across very different 
wavelengths. Instead, we use independent optical depth parameters (7) 
for the two absorption feature fits here. We also use the TARDIS code 
(see below) to achieve a more self-consistent treatment with moving 
atmosphere, line-expansion opacity, which shows the simultaneous 
presence of the Sr II features at around 0.4 ym and1 pm. 


P Cygni modelling 

The expansion velocity of the photosphere is very high (0.2-0.3c). At 
the measured temperature of the photosphere, the thermal widths 
of individual lines are very narrow compared with the gross velocity 
structure. This means that the resonance region is very small and the 
Sobolev approximation can be used in the Elementary Supernova model 
as a prescription for the absorption structure near isolated lines”. We 
use the implementation of the P Cygni profile in the Elementary Super- 
nova from https://github.com/unoebauer/public-astro-tools, where the 
profile is parametrized in terms of the rest wavelength, Ao, the optical 
depth of the line, t, two scaling velocities for the radial dependence of 
tT, the photospheric velocity, and the maximal velocity of the ejecta. The 
latter two parameters specify the velocity stratification. The expansion 
velocity of the photosphere is simultaneously used for the relativistic 
Doppler correction to the blackbody temperature. In addition, because 
the implementation of the P Cygni profile that we are using does not 
include the relative population of the states in the transition, we have 
included a parameter for enhancement/suppression of the P Cygni 
emission component. 

For practical reasons, we cannot fit all lines simultaneously. However, 
fortunately, a handful of lines provides most of the opacity. Because the 
relative opacity dictates the apparent strengths of the lines, we divide 
the spectrum into ultraviolet/blue and red/infrared regions to find 
the lines that will be strongest in their respective spectral region. We 
do this because the opacity changes so severely from the infrared to 
the optical (Fig. 3). We make the division at 600 nm where the opacity 
increases sharply; however, choosing 550 nm or 700 nm makes no dif- 
ference. We then include the strongest lines in each region (all lines with 
aminimum strength of 20% of the strongest line). The resulting lines are 


the strong resonance lines from the ground state of Sr Il at 407.771 nm 
and 421.552 nm, and the lines from the Sr 11 4p°4d metastable states at 
1,032.731 nm, 1,091.489 nm and 1,003.665 nm. These lines are all mod- 
elled using the same P Cygni profile prescription, where the relative 
strengths of each of the lines in the two absorption complexes are set 
by the LTE relations, and despite the relative simplicity of the analysis, 
this approach provides a surprisingly good fit to the data. 

The final model that we use to fit the spectrum is a relativistically cor- 
rected blackbody photosphere absorbed by an expanding atmosphere, 
containing the five above-mentioned Sr II transitions, described by 
independent optical depths for the infrared and ultraviolet lines. The 
ratios of the lines internally in each set are defined by their LTE strengths. 
Inthe fitting model we also use two additional Gaussian emission lines 
at long wavelengths from the second emission component in order not 
to bias the long-wavelength continuum fit. The best-fit parameters and 
their associated errors are found by sampling the posterior probability 
distributions of the parameters, assuming flat priors on all parameters. 
The fitting framework used is LMFIT® and the sampling is done using 
emcee™. We initiate 100 samplers, each sampling for 1,000 steps. We 
discard the first 100 steps as a burn-in phase of the Markov chain Monte 
Carlo (MCMC) chains. We use the median of the marginalized posterior 
probability distribution as the best-fit values, and the 16th and 84th 
percentiles as the uncertainties. The best-fit models are shown in Fig. 4. 
The objective function, being highly nonlinear, causes the posterior 
probability distributions to be highly complex and the best-fit values 
difficult to optimize. However, the peaks of the distributions are well 
centred, meaning that the best-fit values are well constrained, regardless 
of the complexity of the posterior probability distribution. 


Expansion velocity evolution 

The fits constrain two independent parameters that can be used to 
infer the velocity of the ejected material: the photospheric expansion 
velocity, used to determine the width of the P Cygniline profile; andthe 
blackbody radius, which scales with the square root of the observed 
luminosity and can be converted to an expansion velocity on the basis 
of the time of observation. These two parameters are uncorrelated, as 
supported by the MCMC posterior probability function samples, and 
therefore constitute two independent measurements of the same physi- 
cal quantity. We showa plot of the evolution of these two parameters in 
Extended Data Fig. 4. The correspondence between the two estimates 
of the expansion velocity is striking, especially given that the ratio of 
the estimates is geometry dependent, and we have assumed only simple 
spherical symmetry here. Only the first epoch shows a somewhat dis- 
crepant value, and there we do not expect a P Cygni model to be entirely 
applicable. This close correspondence between the two independent 
measures and the reasonable values inferred further supports the valid- 
ity of the line identification and the overall model. 


TARDIS modelling 

TARDIS” is a Monte Carlo radiative-transfer spectral synthesis code, 
in which photons are essentially propagated through an expanding 
atmosphere. Each photon will at any point have a probability of being 
absorbed by an atomic transition, this probability being based on the 
wavelength of the photon, the strength of the line, and the density of 
atomic species and electron populations. A synthetic spectrum can then 
be constructed by collecting the emergent photons. 

To generate the synthetic spectra using TARDIS, we set up the physi- 
cal models using the inferred photospheric expansion velocities at the 
observed epochs. For homologously expanding ejecta, the velocities 
of the atmosphere layers are at all times specified by the outer-edge 
expansion and the photospheric expansion. We use the measured photo- 
spheric expansion velocity as the inner expansion velocity and select the 
outer atmospheric velocity such that the bluest edge of the developed 
absorption profiles in the synthetic spectra match the observed ones. 
At present, TARDIS supports only spherically symmetric explosions, so 


for simplicity we adopt this geometry. The kilonova ejecta are in most 
cases likely to be asymmetric, owing to the preferential motion of the 
mass in the plane of the orbit of the two neutron stars. The neglect of 
deviation from spherical symmetry most likely affects the absorption 
profiles and the inferred mass in the atmosphere, as we could potentially 
only be seeing ejecta in a cone. Additionally, TARDIS assumes a single 
photospheric velocity across the entire wavelength range. Owing to the 
strong wavelength dependence of the opacity, as discussed earlier, the 
depth at which the photons escapes varies across the spectral coverage. 
Therefore, the same reservations about inferring the mass ina given shell 
at a given wavelength applies to the TARDIS simulations. This can be seen 
in effect when choosing an ejecta density that matches the absorption 
feature at 350 nm, because then the strength of the 810-nm absorption 
feature is greatly overpredicted. Conversely, selecting an ejecta density 
that matches the 810-nm absorption feature underpredicts the strength 
of the 350-nm absorption. 

At each epoch, the temperature of the photosphere is chosen so that 
an atmosphere with no lines returns a blackbody-like spectrum that is 
similar to the best-fit blackbody found in simple P Cygni model fits. 
Both the excitation and the ionization structure of the elements in the 
atmosphere are set according to LTE, where we assume for simplicity a 
constant temperature throughout the atmosphere. This approach does 
not capture optical pumping of metastable states and other non-LTE 
effects that will change the population of the upper levels. 

For the input abundances, we use the solar r-process abundance ratio 
as Shown in Fig. 2, starting from 3;Ga. We run the simulation in three steps, 
consecutively including heavier elements. For the first set of simula- 
tions, we include only the elements from ,,Gato ,.Rb and, as can be seen 
in Fig. 2, no lines cause a substantial deviation from a pure blackbody. 
Next we include ;,Sr, which forms the strong feature observed centred 
at 810 nmin the first epoch, almost exclusively owing to the three strong 
Sri lines at around 1 pm. Finally we run the same simulation, including 
allelements from ,,Gato,,U. The feature at 810 nm is unaffected by the 
inclusion of the heavier elements. 

For the density, we initially adopt a power-law density structure of 
the ejecta, parametrized in terms of velocity and epoch: p(u,0) = po(t 
o)°(v/v,)". We find that the line shapes depend on the assumed slope, 
where for steeper slopes a larger fraction of the line absorption is closer 
to the line centre. We specify a density profile of n=-3, as in ref. ”, as 
this supported by the theoretical models and seems to reproduce the 
absorption profiles relatively well. As also investigated in ref.*°, there is 
some freedom in the choice of slope, as it is not well constrained from 
amodelling perspective and could have different values depending on 
the matter-ejection mechanism. 

Adopting a single p, across all four epochs, with n =—3, does not yield 
synthetic spectra that match the observed spectra well around the 810- 
nm.,,Sr absorption feature across the epochs. If p, is chosen to reproduce 
the strength of the ,.Sr absorption feature of the first epoch, the strength 
of the absorption feature is greatly overpredicted in the later epochs 
using the same composition and assuming homology; the ejecta den- 
sity has to be scaled down by a factor of five in the subsequent epochs 
to match the spectrum. In other words, the observed mass of Sr in the 
optically thin part of the spectrum inferred from the TARDIS model 
for the first-epoch spectrum appears to be much larger than for the 
later epochs. Specifically, atmosphere masses of 5 x 10°M.,1x10°M., 
1.2x10°M,and1.3 x 10> of ,.Sr are required to reproduce the observed 
absorption feature at 810 nm for the first four epochs respectively. 

These numbers should be treated with some caution as these are 
derived masses assuming spherical symmetry, a fixed photospheric 
velocity, and no correction for light travel time effects. They must be 
interpreted as lower limits to the total amount of material ejected, as 
they trace only the matter between the photospheric front and the outer 
atmosphere. Using the assumed solar abundances, these masses cor- 
respond to this atmosphere having approximately 1% of the total ejecta 
mass inferred from lightcurve modelling». 


The TARDIS models also constrain the amount of the heavier r-process 
elements present in the outer, transparent layers of the ejecta. Using the 
solar r-process abundances with the inclusion of the heaviest elements, 
the TARDIS synthetic spectra exhibit almost continuous absorption up 
to around 6,000A, whichis not seen inthe observed spectra. This point 
was also touched upon earlier. The exact limit to the amount of heavy 
r-process material in the outer layers is difficult to infer accurately, on 
the basis of the simple models used, but our modelling indicates that the 
ratio of heavy to light element abundance in this layer is much smaller 
than the solar r-process ratio. This conclusion is consistent with the 
inference made by other authors on the basis of the early blue colour 
of the continuum spectrum*”. 

The inability of a single composition and density to reproduce the 
spectra across the first four epochs may hint at a change in the elemental 
abundance ratios as the photosphere recedes further into the ejecta. 

The TARDIS models demonstrate that an isolated feature observed 
at 810 nm can be produced by Sr and that no other known lines form 
this feature. Additionally, the models hint at a possible variation in the 
abundances as the deeper layers of the ejecta component are exposed, 
in line with what is suggested by some models of neutron-star mergers. 


Exclusion of Cs land Te 1 identification 

The Cs16s-6p resonance transitions” would of course require Cs Ito be 
present in the gas. But because Cs has the lowest first ionization poten- 
tial of any element, the singly charged ions of other elements inevitably 
synthesized with Cs, suchas La II, Eu 11 and Gd II, are millions of times 
more abundant than CsIin anLTE plasmaat close to the observed black- 
body temperatures. This problem is even worse at temperatures that 
produce substantial strong lines from Te I. These other elements will 
cause absorption lines that are at least two orders of magnitude stronger 
in the same wavelength region as the proposed Cs and Te lines—for 
example, the 706.62 nm, 742.66 nm or 929.05 nm lines of La 11, Eu 1land 
Gd II respectively, to name one of each. The same argument holds for the 
excited-state transition of Te 1, which hasa very high excitation energy 
of 5.49 eV; the relative population of the Te I excited state is extremely 
low, less than 10”. Thus, no realistic scenario exists in which lines from 
either of these species can be detected without being dominated by lines 
from other elements that are orders of magnitude stronger. 
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Extended Data Fig. 1| Synthetic r-process-element transmission spectra. of species are also shown. The green dotted line shows the heavy r-process 
These spectra were generated using MOOG, in which the relative abundances elements (;,Ba to 9,U); the blue dotted line shows the light r-process elements 
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broadened and normalized as in Fig. 3. The solid black line is the total Sr stands out in absorption, regardless of the composition of the material. The 
transmission spectrum for an atmosphere containing all the r-process elements normalization is arbitrary and different to the LTE equivalent in Fig. 3 for display 
(,3AS to 99U). The dashed black line is the same spectrum, but including only the reasons. 
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Extended Data Fig. 2 | Synthetic r-process transmission spectra. The spectra were generated with MOOG and are similar to those shown in Extended Data Fig. 1, 
except that all element contributions are displayed individually. The elements that contribute most at the reddest wavelengths are noted within the plotted line. 
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Extended Data Fig. 4 | Evolution of the ejecta expansion velocity. The velocities were determined independently from the P Cygni absorption line widths (blue 
points) and the blackbody radius (red points). Uncertainties shown are lo. The correspondence between the two independent estimates is striking. 
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Extended Data Fig. 5 | Comparison of the expansion opacities at modest density of 8.4« 10 g cm® for Sr or Ce, an electron density of 7.6 x 10®°cm®°, anda 
optical depths for Sr and Ce. This calculation shows the potential of Sr to 1% atmospheric radius at 1.5 days after the explosion. Line lists used for Sr and Ce 
dominate the opacity at around 1 pm at low optical depths. The opacities are are from the Kuruczand VALD databases respectively. 
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Extended Data Fig. 6 | Radiative transfer models from the first four epochs including all elements from 3,Ga to ,.U. These models show that the spectra are 
using the TARDIS code. The blue line is the synthetic TARDIS spectrum using well reproduced with elements around the first r-process abundance peak, 
relative solar r-process abundances and including elements from ;,Ga to ;,Rb— specifically Sr. 
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Because of their ability to confine light, optical resonators!’ are of great importance to 
science and technology, but their performance is often limited by out-of-plane- 


scattering losses caused by inevitable fabrication imperfections**. Here we 
theoretically propose and experimentally demonstrate a class of guided resonances in 
photonic crystal slabs, in which out-of-plane-scattering losses are strongly suppressed 
by their topological nature. These resonances arise when multiple bound states in the 
continuum—each carrying a topological charge’—merge in momentum space and 
enhance the quality factors Q of all nearby resonances in the same band. Using such 
resonances in the telecommunication regime, we experimentally achieve quality 
factors as high as 4.9 x 10°—12 times higher than those obtained with standard designs— 
and this enhancement remains robust for all of our samples. Our work paves the way 
for future explorations of topological photonics in systems with open boundary 
conditions and for their application to the improvement of optoelectronic devices in 
photonic integrated circuits. 


Topological defects’ are ubiquitous in the natural world. Examples range 
from quantum vortices in superfluids to singular optical beams®, which 
are characterized by the non-trivial winding patterns of system param- 
eters (velocity, phase or polarization) in real space. Recently, it was found 
that unexpected topological defects can also emerge in the momentum 
space of acrystal and give rise to interesting physical consequences; one 
such example is the optical bound states in the continuum (BICs). BICs 
reside inside the continuous spectrum of extended states but counter- 
intuitively remain perfectly localized in space and their lifetimes are 
theoretically infinitely long. Since their initial proposal’, BICs have been 
observed ina variety of wave systems, including photonic!*’, phononic® 
and water waves”. Furthermore, they have been used to enhance various 
applications, such as lasers’*® and antennas”, by providing an out- 
coupling channel through their surface-emitting nature. In photonic 
crystal slabs, it has been identified that their fundamental nature is 
topological; they are essentially topological defects in polarization 
directions defined in momentum space’. In practice”, the quality 
factors of such BICs are often much lower than their theoretical predic- 
tion of infinity, limited to only about 10*. Aside from other contributing 
factors, such as material absorption or the finite size of samples, the 
main limiting factor of the Q value of BICs is scattering losses caused by 
fabrication imperfections or disorders—a common problem for many 
high-Q on-chip resonators’*°. 

Here we theoretically propose and experimentally demonstrate 
on-chip photonic resonances that are much less susceptible to out-of- 
plane-scattering losses than expected, owing to their unique topologi- 
cal features. We start by showing that resonances with ultrahigh Q can 
be achieved by merging multiple BICs. First, we consider a photonic 
crystal slab (Fig. 1a), in which a square lattice (periodicity a=519.25 nm) 
of circular air holes (radius r=175 nm) is patterned in silicon (thickness 


h=600 nm). With the use of numerical simulations (using the COM- 
SOL Multiphysics software), we focus on the transverse electric (TE) A 
band (red line), featuring nine BICs where Q diverges to infinity (Fig. 1b). 
The topological nature of the BICs can be understood from the cor- 
responding far-field polarization plots (Fig. 1b, bottom panels), where 
each BIC appears as a topological defect (vortex) in the polarization 
long axes*”””°”23 characterized by an integer topological charge of +1. 
Among these nine vortices, one is pinned at the centre of the Brillouin 
zone owing to symmetry, whereas the locations of the remaining eight 
canbe controlled by varying system parameters suchas the periodicity. 
For example, when a increases from 519.25 nm to 531.42 nm, the eight 
off-centre BICs move towards the centre until all nine of them merge 
into a single BIC with a charge of +1 when a= 531.42 nm. This single BIC 
persists when a further increases to 580 nm. 

The topological configuration of BICs controls radiative losses of 
all nearby resonances. Specifically, Q is shown to decay quadratically 
(Q« 1/k’) with respect to the distance k (in momentum space) froma sin- 
gle isolated BIC with charge +1; however, this scaling changes to Q.«1/k® 
inthe configuration in which all nine BICs merge (referred to as ‘merging- 
BIC design’ hereafter). A comparison between these two scenarios is 
shown in Fig. 1c, where the Q values in a merging-BIC design (red) are 
always orders of magnitude higher than those in an isolated-BIC design 
(blue) along all directions in k space, owing to their fundamentally dif- 
ferent scaling properties. This difference in scaling originates from the 
asymptotic behaviour of Q « 1/[k(k + Kgi-)(kK— Kg)? in the regime in which 
off-centre BICs at +k,,, and centred BIC coexist, as shown in Fig. 1c (grey 
lines). In the merging-BIC design, ky. > O and we get Q « 1/k°. Further 
discussion is provided in Supplementary Information sections I and II. 

Although simulation results of infinitely large perfect photonic crys- 
tals reproduce radiative quality factors, real samples (schematically 
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Fig. 1| Suppressing radiation losses by merging multiple topological 
charges. a, Left, Schematic of a photonic crystal slab and the factors 
contributing to loss. Right, Simulated band structure. The TE A band is marked 
with ared line, and wis the normalized resonance frequency. b, Multiple BICs 
appear on band TEA, where the radiative quality factor Q diverges. Top, 
Simulated Q for various values of the sample periodicity, a. Bottom, far-field 
polarization plots. When ais tuned from 519.25 nm (left) to 580 nm (right), nine 
BICs at ky, with topological charge +1 merge into an isolated BIC with charge +1. 
c, Simulated Q before (a= 519.25 nm; grey) and after charges merge at the centre 
of the Brillouin zone (a=580 nm; blue). The transition (a=531.42 nm; red) 
corresponds to the merging-BIC configuration, which shows considerably 
higher quality factors than the isolated-BIC configuration (blue). This is caused 
by achange toa scaling rule of Q.« 1/k*, whichis observed along both the [-X and 
I-M directions. All simulations used the finite-element method in COMSOL. 


shown in Fig. 2a) feature some major differences that determine the 
highest Qachievablein practice. First, allsamples are finite in size; their 
boundaries introduce fractional orders of the primitive reciprocal lattice 
inkspace (green dots in Fig. 2a; see Supplementary Information section 
Ill for details)”*. Second, all fabricated samples exhibit disorder and 
imperfections with both long- and short-range correlations, allowing 
modes at different k points to couple to each other. Because of these 
inevitable coupling terms, modes at different fractional momentum 
orders are hybridized and all of their loss channels become available 
to the final resonance”. 
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Fig. 2| Topological protection against scattering losses. a, Schematic ofa 
fabricated photonic crystal sample (solid lines; top), with disorder inthe 
locations and radii of the holes (dashed lines; top). In reciprocal space, fractional 
orders of momentum (green dots; bottom) are introduced by the supercell. Nis 
the size of the supercell in increments of the periodicity a and By, is the minimum 
step size of the fractional orders of momentum. b, Momentum-energy 
distribution of the highest-Q mode in the merging-BIC design for a perfect (top) 
anda disordered (bottom) structure inside the first Brillouin zone. a.u., arbitrary 
units. c, Momentum-energy distribution of the radiation field in the disordered 
merging-BIC (top) and isolated-BIC (bottom) designs. The white circles 
represent the light cone—the region in the momentum space where guided 
resonance can couple to the radiation channel, as determined by the structure of 
the sample. The scattering loss is considerably lower in the merging-BIC sample 
than in the isolated-BIC sample. d, Schematic of an asymmetric hole (top) acting 
asa fabrication imperfection, and simulated Q values near the centre of the 
Brillouin zone obtained by the application of disorder (bottom). All simulations 
were performed ina 15a x 15a supercell (V=15). 


The advantage of our merging-BIC design over an isolated-BIC 
design is confirmed by simulations (using the COMSOL Multiphysics 
software) of perturbed 15 x 15 photonic crystal supercells. Ina perfect 
supercell structure without disorder, a BIC with infinite Q remains at 


the centre of the Brillouin zone (Fig. 2b, upper panel). For comparison, 
perturbations are applied to both the radii (Ar) and positions (Ax, Ay) 
of the holes according to the statistics that best describes our samples 
(Fig. 3). As expected, each mode in the disordered samples has multi- 
ple components in k space. Furthermore, resonances in a disordered 
sample with a merging-BIC design have considerably lower radiation 
fields than those from an isolated-BIC design with the same disorder 
(Fig. 2c). This result agrees well with Fig. 1b, c: all modes contributing 
to the final resonance in the merging-BIC sample have much higher 
Q values than those in the isolated-BIC case because resonances in 
the former are much more immune to out-of-plane scattering from 
disorder than in the latter. Finally, this enhancement of Q is found to 
be robust across a range of k values, as shown in Fig. 2d (bottom panel), 
by assuming asymmetric holes to represent typical fabrication errors 
before applying the disorder (see Supplementary Information sections 
IV and V for details). 

To verify our theoretical findings, we fabricate photonic crystal sam- 
ples with both merging-BIC and isolated-BIC designs using the same 
electron-beam lithography and inductively coupled plasma etching 
processes on a 600-nm-thick silicon-on-insulator wafer (see Methods 
for details). The underlying SiO, layer is then removed to restore the up- 
down mirror symmetry required for tunable BICs*””. Alternatively, one 
may use refractive-index-matching liquid or deposition layers instead. 
The samples are about 250 x 250 jm’ in size. The periodicity is varied 
from 530 nm to 580 nm to sample designs with merging and isolated 
BICs. From the scanning electron microscope images of the samples 
(Fig. 3a, b), the standard deviations of the hole locations and radii are 
estimated to be about 5 nm, whichis used in the simulations discussed 
earlier. 

Aschematic of the experimental setup is shown in Fig. 3c. A tun- 
able telecommunication laser with light in the C+L band is first sent 
through an X-polarizer before the light is focused by a lens (L1) onto 
the back focal plane of an infinity-corrected objective lens. The inci- 
dent angle of the laser on the sample is thus controlled by moving L1 
on the x-y plane. Using this confocal setup, reflected and scattered 
light are also collected by the same objective; they are then magni- 
fied 1.67 times through a relay 4f system and imaged on a camera. A 
Y-polarizer is used to block reflected light (X-polarized) while allowing 
scattered light to pass (see Methods and Supplementary Information 
section VI). Under the on-resonance coupling condition, where the 
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Fig. 4 | Experimental results. a, Isofrequency contours of the sample at 
different wavelengths are observed onthe camera (right). Three examples are 
shownas dashed lines. The scattered-light intensity at different points in 
momentum space (X, Y, Z; left) is further characterized using a photodiode, and 
is fitted by symmetric Lorentzians as a function of incident wavelength. The 
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Fig. 3| Experimental setup. a, b, Scanning electron microscope images of the 
fabricated photonic crystal sample; top (a) and side (b) view. c, Schematic of the 
measurement setup. The blue lines represent the incident light and its direct 
reflection. The light red region denotes radiation losses induced by scattering 
from disorder. L, lens. BS, beamsplitter. 


photonic crystal sample supports a resonance at the same wavelength 
as the incident light at that incident angle, isofrequency contours are 
observed on the camera, similarly to previously reported results””°. 
Three examples of isofrequency contours are schematically shown in 
Fig. 4a as dashed lines. 

The Q values of resonances at different k points are further character- 
ized using scattered light. Specifically, amovable pinhole (not shownin 
Fig. 3c) is placed on the image plane of the objective’s rear focal plane 
to specify a k point. A photodiode connected to a lock-in amplifier is 
placed behind the pinhole to record the intensity of the scattered light 
as a function of the wavelength of the tunable laser. As shown in Fig. 4a, 
when different k points are selected by the pinhole (X, Y and Z), different 
scattering spectra are obtained, all exhibiting symmetric Lorentzian 
features. Similar scattering phenomena have been observed before”® 
and can be understood as follows: the intensity of scattered light is gov- 
erned by the spectral density of states of the sample at that k point, and is 
described by a Lorentzian function centred at the resonance frequency, 
with linewidth determined by the Q value of the resonance (see Sup- 
plementary Information section VII for details). 
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linewidth is determined by the Q value of the underlying resonance. b, c, The 
highest Q observed in the merging-BIC sample is 4.9 x 10° at point W (b), whichis 
more than an order of magnitude larger than that of anisolated-BIC sample 
constructed with the same fabrication process (Q=4.0 x 10*;¢). 
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Fig. 5|12-fold enhancement of Q via topological protection. a, Dispersion of 
resonances measured at different points in momentum space (circles), showing 
good agreement with FEM simulation predictions (dashed lines) along both the 
I-X (top) and I-M (bottom) directions. b, An enhancement of Q by a factor of 
more than 10 is observed over a wide range in momentum space for the merging- 
BIC samples (red and blue) compared to the isolated-BIC sample (purple) owing 
to topological protection. 


The Q values of the resonances are extracted by numerically fitting 
the scattering spectra with Lorentzian functions. As shown in Fig. 4a, Q 
increases from 2.6 x 10° to 4.5 x 10° as the observing point moves closer 
to the centre of the Brillouin zone from X to Z. This agrees well with the 
simulation results in Fig. 1. The highest Q observed in the merging-BIC 
sample is 4.9 x 10° at point W (Fig. 4b). In comparison, the highest Q 
observed in the isolated-BIC sample—fabricated on the same wafer 
through the same processes as the merging-BIC sample, but with dif- 
ferent structural parameters—is limited to only 4 x 10*, more than an 
order of magnitude lower (Fig. 4c). This confirms our simulation results 
in Fig. 2, which indicate that engineering the topological configurations 
of BICs can substantially suppress scattering losses. Furthermore, this 
over-ten-fold enhancement of the quality factor is observed to be robust: 
not only does it appear over a wide range in kK space, as shown in Fig. 5, 
but a similar level of enhancement is also observed in all merging-BIC 
samples that we fabricated (see Supplementary Information section 
VIII for details). 

Topological photonics” ”’ has found tremendous success in sup- 
pressing in-plane back-scattering losses by breaking reciprocity. Here 
we use topology to solve a different class of problems, by suppress- 
ing out-of-plane-scattering losses in a reciprocal system. By merging 
multiple topological charges carried by BICs, we experimentally dem- 
onstrate photonic crystal resonances with record-high quality factors 
of Q=4.9 x 10°, more than an order of magnitude higher than those of 
ordinary designs. These ultrahigh-Q resonances are potentially useful 
for chemical or biological sensing” and large-area laser applications”. 
Furthermore, our high-Q resonances are observed to be robust against 
fabrication imperfections, and can help to improve the performance 
of optoelectronic devices using concepts from topological photonics. 
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Methods 


Sample fabrication 

The sample was fabricated ona silicon-on-insulator wafer with electron- 
beam lithography (EBL), followed by inductively coupled plasma etching. 
For EBL, the silicon-on-insulator wafer was spin-coated with a330-nm-thick 
layer of ZEP520A photo-resist before being exposed to EBL (JBX-6300FS) 
atabeam current of 400 pA andafield size of 500 um. Thesample was then 
etched with ICP (Oxford Plasmapro Estrelas 100) using mixture of SF, and 
C,F,. After etching, the resist was removed with N-methyl-2-pyrrolidone 
and the buried oxide layer was removed using 49% HF. 


Measurement system 

The incident light source was a tunable C+L-band telecommunication 
laser (Santec TSL-550), which was sent through a chopper for lock-in 
detection. A pinhole with diameter of 500 pm was placed onthe Fourier 
plane to select the desired wavevectors. Light scattered through the 
pinhole was collected by a photodiode (PDA10DT-EC), which was con- 
nected to alock-in amplifier (SRS SR830). A flip mirror was used to switch 
between the camera that was used to image isofrequency contours and 
the photodiode. Besides characterizing far-field radiation patterns, the 
setup could also take near-field images of the sample if another lens was 
inserted into the optical path. 


Data availability 


The data that support the plots in this paper and other findings of this 
study are available from the corresponding author upon request. 
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The promise of quantum computers is that certain computational tasks might be 
executed exponentially faster on a quantum processor than ona classical processor’. A 
fundamental challenge is to build a high-fidelity processor capable of running quantum 
algorithms in an exponentially large computational space. Here we report the use of a 
processor with programmable superconducting qubits” ’ to create quantum states on 
53 qubits, corresponding to a computational state-space of dimension 2° (about 10"). 
Measurements from repeated experiments sample the resulting probability 
distribution, which we verify using classical simulations. Our Sycamore processor takes 
about 200 seconds to sample one instance of a quantum circuit a million times—our 
benchmarks currently indicate that the equivalent task for a state-of-the-art classical 


supercomputer would take approximately 10,000 years. This dramatic increase in 
speed compared to all known classical algorithms is an experimental realization of 
quantum supremacy® “ for this specific computational task, heralding a much- 
anticipated computing paradigm. 


Inthe early 1980s, Richard Feynman proposed that a quantum computer 
would be an effective tool with which to solve problems in physics 
and chemistry, given that it is exponentially costly to simulate large 
quantum systems with classical computers’. Realizing Feynman’s vision 
poses substantial experimental and theoretical challenges. First, can 
a quantum system be engineered to perform a computation ina large 
enough computational (Hilbert) space and with a low enough error 
rate to provide a quantum speedup? Second, can we formulate a prob- 
lem that is hard for a classical computer but easy for a quantum com- 
puter? By computing such a benchmark task on our superconducting 
qubit processor, we tackle both questions. Our experiment achieves 
quantum supremacy, a milestone on the path to full-scale quantum 
computing® “*, 


In reaching this milestone, we show that quantum speedup is achiev- 
able ina real-world system andis not precluded by any hidden physical 
laws. Quantum supremacy also heralds the era of noisy intermediate- 
scale quantum (NISQ) technologies”. The benchmark task we demon- 
strate has an immediate application in generating certifiable random 
numbers (S. Aaronson, manuscript in preparation); other initial uses 
for this new computational capability may include optimization”, 
machine learning’*, materials science and chemistry” *. However, 
realizing the full promise of quantum computing (using Shor’s algorithm 
for factoring, for example) still requires technical leaps to engineer 
fault-tolerant logical qubits>”’. 

To achieve quantum supremacy, we made a number of techni- 
cal advances which also pave the way towards error correction. We 
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developed fast, high-fidelity gates that can be executed simultaneously 
across a two-dimensional qubit array. We calibrated and benchmarked 
the processor at both the component and system level using a powerful 
new tool: cross-entropy benchmarking”. Finally, we used component- 
level fidelities to accurately predict the performance of the whole sys- 
tem, further showing that quantum information behaves as expected 
when scaling to large systems. 


Asuitable computational task 


To demonstrate quantum supremacy, we compare our quantum proces- 
sor against state-of-the-art classical computers in the task of sampling 
the output of a pseudo-random quantum circuit”*"*. Random circuits 
are a suitable choice for benchmarking because they do not possess 
structure and therefore allow for limited guarantees of computational 
hardness” ’. We design the circuits to entangle a set of quantum bits 
(qubits) by repeated application of single-qubit and two-qubit logi- 
cal operations. Sampling the quantum circuit’s output produces a set 
of bitstrings, for example {0000101, 1011100, ...}. Owing to quantum 
interference, the probability distribution of the bitstrings resembles 
aspeckled intensity pattern produced by light interference in laser 
scatter, such that some bitstrings are much more likely to occur than 
others. Classically computing this probability distribution becomes 
exponentially more difficult as the number of qubits (width) and number 
of gate cycles (depth) grow. 

We verify that the quantum processor is working properly using a 
method called cross-entropy benchmarking”?*, which compares how 
often each bitstring is observed experimentally with its corresponding 
ideal probability computed via simulation ona classical computer. For 
agiven circuit, we collect the measured bitstrings {x} and compute the 
linear cross-entropy benchmarking fidelity””’* (see also Supplementary 
Information), which is the mean of the simulated probabilities of the 
bitstrings we measured: 


Fyep= 2"(P(X))), -1 (1) 


where nis the number of qubits, P(x;) is the probability of bitstring x; 
computed for the ideal quantum circuit, and the average is over the 
observed bitstrings. Intuitively, Fy, is correlated with how often we 
sample high-probability bitstrings. When there are no errors in the 
quantum circuit, the distribution of probabilities is exponential (see 
Supplementary Information), and sampling from this distribution will 
produce Fy¢,=1. On the other hand, sampling from the uniform 
distribution will give (P(x); = 1/2" and produce Fy¢p = O. Values of Fy¢g 
between 0 and 1 correspond to the probability that no error has occurred 
while running the circuit. The probabilities P(x,) must be obtained from 
classically simulating the quantum circuit, and thus computing Fyrpis 
intractable inthe regime of quantum supremacy. However, with certain 
circuit simplifications, we can obtain quantitative fidelity estimates of 
a fully operating processor running wide and deep quantum circuits. 

Our goal is to achieve a high enough Fy, for a circuit with sufficient 
width and depth such that the classical computing cost is prohibitively 
large. This is a difficult task because our logic gates are imperfect and 
the quantum states we intend to create are sensitive to errors. A single 
bit or phase flip over the course of the algorithm will completely shuffle 
the speckle pattern and result in close to zero fidelity" (see also Sup- 
plementary Information). Therefore, in order to claim quantum suprem- 
acy we need a quantum processor that executes the program with 
sufficiently low error rates. 


Building a high-fidelity processor 


We designed a quantum processor named ‘Sycamore’ which consists 
of atwo-dimensional array of 54 transmon qubits, where each qubit is 
tunably coupled to four nearest neighbours, ina rectangular lattice. The 
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Fig.1| The Sycamore processor. a, Layout of processor, showing arectangular 
array of 54 qubits (grey), each connected to its four nearest neighbours with 
couplers (blue). The inoperable qubit is outlined. b, Photograph of the 
Sycamore chip. 


connectivity was chosen to be forward-compatible with error correc- 
tion using the surface code”. A key systems engineering advance of this 
device is achieving high-fidelity single- and two-qubit operations, not 
just in isolation but also while performing a realistic computation with 
simultaneous gate operations on many qubits. We discuss the highlights 
below; see also the Supplementary Information. 

Ina superconducting circuit, conduction electrons condense into a 
macroscopic quantum state, such that currents and voltages behave 
quantum mechanically”*”. Our processor uses transmon qubits®, which 
can be thought of as nonlinear superconducting resonators at 5-7 GHz. 
The qubit is encoded as the two lowest quantum eigenstates of the 
resonant circuit. Each transmon has two controls: a microwave drive 
to excite the qubit, and a magnetic flux control to tune the frequency. 
Each qubit is connected to alinear resonator used to read out the qubit 
state®. As shown in Fig. 1, each qubit is also connected to its neighbouring 
qubits using anew adjustable coupler”. Our coupler design allows us 
to quickly tune the qubit-qubit coupling from completely off to 40 MHz. 
One qubit did not function properly, so the device uses 53 qubits and 
86 couplers. 

The processor is fabricated using aluminium for metallization and 
Josephson junctions, and indium for bump-bonds between two silicon 
wafers. The chip is wire-bonded to a superconducting circuit board 
and cooled to below 20 mK ina dilution refrigerator to reduce ambient 
thermal energy to well below the qubit energy. The processor is con- 
nected through filters and attenuators to room-temperature electronics, 


which synthesize the control signals. The state of all qubits can be read 
simultaneously by using a frequency-multiplexing technique®**. We use 
two stages of cryogenic amplifiers to boost the signal, whichis digitized 
(8 bits at 1 GHz) and demultiplexed digitally at room temperature. In 
total, we orchestrate 277 digital-to-analog converters (14 bits at 1 GHz) 
for complete control of the quantum processor. 

We execute single-qubit gates by driving 25-ns microwave pulses reso- 
nant with the qubit frequency while the qubit-qubit coupling is turned 
off. The pulses are shaped to minimize transitions to higher transmon 
states®. Gate performance varies strongly with frequency owing to two- 
level-system defects”, stray microwave modes, coupling to control 
lines and the readout resonator, residual stray coupling between qubits, 
flux noise and pulse distortions. We therefore optimize the single-qubit 
operation frequencies to mitigate these error mechanisms. 

We benchmark single-qubit gate performance by using the cross- 
entropy benchmarking protocol described above, reduced to the single- 
qubit level (n = 1), to measure the probability of an error occurring 
during a single-qubit gate. On each qubit, we apply a variable number 
mof randomly selected gates and measure Fy, averaged over many 
sequences; as mincreases, errors accumulate and average Fy,;, decays. 
We model this decay by [1 - e,/(1-1/D’)]” where e, is the Pauli error prob- 
ability. The state (Hilbert) space dimension term, D = 2", which equals 
2 for this case, corrects for the depolarizing model where states with 
errors partially overlap with the ideal state. This procedure is similar to 
the more typical technique of randomized benchmarking’’**””, but 
supports non-Clifford-gate sets*’ and can separate out decoherence 
error from coherent control error. We then repeat the experiment with 
all qubits executing single-qubit gates simultaneously (Fig. 2), which 
shows only a small increase in the error probabilities, demonstrating 
that our device has low microwave crosstalk. 

We perform two-qubit iSWAP-like entangling gates by bringing neigh- 
bouring qubits on-resonance and turning ona20-MHz coupling for 12ns, 
which allows the qubits to swap excitations. During this time, the qubits 
also experience a controlled-phase (CZ) interaction, which originates 
from the higher levels of the transmon. The two-qubit gate frequency 
trajectories of each pair of qubits are optimized to mitigate the same error 
mechanisms considered in optimizing single-qubit operation frequencies. 

To characterize and benchmark the two-qubit gates, we run two-qubit 
circuits with m cycles, where each cycle contains a randomly chosen 
single-qubit gate on each of the two qubits followed by a fixed two-qubit 
gate. We learn the parameters of the two-qubit unitary (such as the 
amount of iSWAP and CZ interaction) by using Fypp as a cost function. 
After this optimization, we extract the per-cycle error e,, from the decay 
of Fy¢,with m, and isolate the two-qubit error e, by subtracting the two 
single-qubit errors e,. We find an average e, of 0.36%. Additionally, we 
repeat the same procedure while simultaneously running two-qubit 
circuits for the entire array. After updating the unitary parameters to 
account for effects such as dispersive shifts and crosstalk, we find an 
average e, of 0.62%. 

For the full experiment, we generate quantum circuits using the two- 
qubit unitaries measured for each pair during simultaneous operation, 
rather than a standard gate for all pairs. The typical two-qubit gate is a 
full iSWAP with 1/6th of a full CZ. Using individually calibrated gates in 
no way limits the universality of the demonstration. One can compose, 
for example, controlled-NOT (CNOT) gates from 1-qubit gates and two 
of the unique 2-qubit gates of any given pair. The implementation of 
high-fidelity ‘textbook gates’ natively, such as CZ or J iSWAP, is work 
in progress. 

Finally, we benchmark qubit readout using standard dispersive meas- 
urement“. Measurement errors averaged over the 0 and 1 states are 
shown in Fig. 2a. We have also measured the error when operating all 
qubits simultaneously, by randomly preparing each qubit in the O or1 
state and then measuring all qubits for the probability of the correct 
result. We find that simultaneous readout incurs only amodest increase 
in per-qubit measurement errors. 
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Fig. 2 | System-wide Pauli and measurement errors. a, Integrated histogram 
(empirical cumulative distribution function, ECDF) of Pauli errors (black, green, 
blue) and readout errors (orange), measured on qubits in isolation (dotted lines) 
and when operating all qubits simultaneously (solid). The median of each 
distribution occurs at 0.50 on the vertical axis. Average (mean) values are shown 
below. b, Heat map showing single- and two-qubit Pauli errors e, (crosses) ande, 
(bars) positioned in the layout of the processor. Values are shown for all qubits 
operating simultaneously. 


Having found the error rates of the individual gates and readout, we 
can model the fidelity of a quantum circuit as the product of the prob- 
abilities of error-free operation of all gates and measurements. Our 
largest random quantum circuits have 53 qubits, 1,113 single-qubit gates, 
430 two-qubit gates, and a measurement on each qubit, for which we 
predict a total fidelity of 0.2%. This fidelity should be resolvable witha 
few million measurements, since the uncertainty on Fypis1/./N,, where 
N, isthe number of samples. Our model assumes that entangling larger 
and larger systems does not introduce additional error sources beyond 
the errors we measure at the single- and two-qubit level. In the next 
section we will see how well this hypothesis holds up. 


Fidelity estimation in the supremacy regime 


The gate sequence for our pseudo-random quantum circuit generation 
is shown in Fig. 3. One cycle of the algorithm consists of applying 
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Fig. 3 | Control operations for the quantum supremacy circuits. a, Example 
quantum circuit instance used in our experiment. Every cycle includes a layer 
each of single- and two-qubit gates. The single-qubit gates are chosen randomly 
from{/X, JY, /W}, where W=(X+ Y)/./2 and gates do not repeat sequentially. 
The sequence of two-qubit gates is chosen according toatiling pattern, 
coupling each qubit sequentially to its four nearest-neighbour qubits. The 


single-qubit gates chosen randomly from {/X, JY, JW} on all qubits, 
followed by two-qubit gates on pairs of qubits. The sequences of gates 
which form the ‘supremacy circuits’ are designed to minimize the circuit 
depth required to create a highly entangled state, which is needed for 
computational complexity and classical hardness. 

Although we cannot compute F,-p,in the supremacy regime, we can 
estimate it using three variations to reduce the complexity of the circuits. 
In ‘patch circuits’, we remove aslice of two-qubit gates (a small fraction 
of the total number of two-qubit gates), splitting the circuit into two 
spatially isolated, non-interacting patches of qubits. We then compute 
the total fidelity as the product of the patch fidelities, each of which can 
be easily calculated. In ‘elided circuits’, we remove only a fraction of the 
initial two-qubit gates along the slice, allowing for entanglement 
between patches, which more closely mimics the full experiment while 
still maintaining simulation feasibility. Finally, we can also run full 
‘verification circuits’, with the same gate counts as our supremacy cir- 
cuits, but with a different pattern for the sequence of two-qubit gates, 
which is much easier to simulate classically (see also Supplementary 
Information). Comparison between these three variations allows us to 
track the system fidelity as we approach the supremacy regime. 

We first check that the patch and elided versions of the verification 
circuits produce the same fidelity as the full verification circuits up to 
53 qubits, as shown in Fig. 4a. For each data point, we typically collect 
N,=5% 10° total samples over ten circuit instances, where instances 
differ only in the choices of single-qubit gates in each cycle. We also 
show predicted 7,,,Vvalues, computed by multiplying the no-error prob- 
abilities of single- and two-qubit gates and measurement (see also Sup- 
plementary Information). The predicted, patch and elided fidelities all 
show good agreement with the fidelities of the corresponding full cir- 
cuits, despite the vast differences in computational complexity and 
entanglement. This gives us confidence that elided circuits can be used 
to accurately estimate the fidelity of more-complex circuits. 

The largest circuits for which the fidelity can still be directly verified 
have 53 qubits and a simplified gate arrangement. Performing random 
circuit sampling on these at 0.8% fidelity takes one million cores 130 
seconds, corresponding toa million-fold speedup of the quantum pro- 
cessor relative to a single core. 

We proceed now to benchmark our computationally most difficult 
circuits, which are simply a rearrangement of the two-qubit gates. In 
Fig. 4b, we show the measured F\¢p for 53-qubit patch and elided ver- 
sions of the full supremacy circuits with increasing depth. For the larg- 
est circuit with 53 qubits and 20 cycles, we collected V,=30 x 10° samples 
over ten circuit instances, obtaining Fy;, = (2.24+0.21) x10 for the 
elided circuits. With 50 confidence, we assert that the average fidelity 
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couplers are divided into four subsets (ABCD), each of whichis executed 
simultaneously across the entire array corresponding to shaded colours. Here 
we show an intractable sequence (repeat ABCDCDAB); we also use different 
coupler subsets along witha simplifiable sequence (repeat EFGHEFGH, not 
shown) that can be simulated on aclassical computer. b, Waveform of control 
signals for single- and two-qubit gates. 


of running these circuits on the quantum processor is greater than at 
least 0.1%. We expect that the full data for Fig. 4b should have similar 
fidelities, but since the simulation times (red numbers) take too long to 
check, we have archived the data (see ‘Data availability’ section). The 
data is thus in the quantum supremacy regime. 


The classical computational cost 


We simulate the quantum circuits used in the experiment on classical 
computers for two purposes: (1) verifying our quantum processor and 
benchmarking methods by computing 7,;, where possible using sim- 
plifiable circuits (Fig. 4a), and (2) estimating 7,,-,as well as the classical 
cost of sampling our hardest circuits (Fig. 4b). Up to 43 qubits, we use 
a Schrodinger algorithm, which simulates the evolution of the full quan- 
tum state; the Jiilich supercomputer (with 100,000 cores, 250 terabytes) 
runs the largest cases. Above this size, there is not enough random access 
memory (RAM) tostore the quantum state”. For larger qubit numbers, 
we use a hybrid Schrédinger-Feynman algorithm” running on Google 
data centres to compute the amplitudes of individual bitstrings. This 
algorithm breaks the circuit up into two patches of qubits and efficiently 
simulates each patch using a Schrédinger method, before connecting 
them using an approach reminiscent of the Feynman path-integral. 
Although it is more memory-efficient, the Schrédinger-Feynman algo- 
rithm becomes exponentially more computationally expensive with 
increasing circuit depth owing to the exponential growth of paths with 
the number of gates connecting the patches. 

To estimate the classical computational cost of the supremacy circuits 
(grey numbers in Fig. 4b), we ran portions of the quantum circuit simu- 
lation on both the Summit supercomputer as well as on Google clusters 
and extrapolated to the full cost. In this extrapolation, we account for 
the computation cost of sampling by scaling the verification cost with 
Fyrp, for example, a 0.1% fidelity decreases the cost by about 1,000. 
On the Summit supercomputer, which is currently the most powerful 
inthe world, we used a method inspired by Feynman path-integrals that 
is most efficient at low depth** “””. At m= 20 the tensors do not reason- 
ably fitinto node memory, so we can only measure runtimes up to m=14, 
for which we estimate that sampling three million bitstrings with 1% 
fidelity would require a year. 

On Google Cloud servers, we estimate that performing the same task 
for m=20 with 0.1% fidelity using the Schrédinger-Feynman algorithm 
would cost 50 trillion core-hours and consume one petawatt hour of 
energy. To put this in perspective, it took 600 seconds to sample the 
circuit on the quantum processor three million times, where sampling 
time is limited by control hardware communications; in fact, the net 
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Fig. 4| Demonstrating quantum supremacy. a, Verification of benchmarking 
methods. Fy¢z values for patch, elided and full verification circuits are 
calculated from measured bitstrings and the corresponding probabilities 
predicted by classical simulation. Here, the two-qubit gates are applied ina 
simplifiable tiling and sequence such that the full circuits can be simulated out 
ton=53,m=14inareasonable amount of time. Each data point is an average over 
ten distinct quantum circuit instances that differ in their single-qubit gates (forn 
=39, 42 and 43 only two instances were simulated). For eachn, eachinstanceis 
sampled with N, of 0.5-2.5 million. The black line shows the predicted Fy;, based 
on single- and two-qubit gate and measurementerrors. The close 
correspondence between all four curves, despite their vast differencesin 


quantum processor time is only about 30 seconds. The bitstring samples 
from all circuits have been archived online (see ‘Data availability’ section) 
to encourage development and testing of more advanced verification 
algorithms. 

One may wonder to what extent algorithmic innovation can enhance 
classical simulations. Our assumption, based on insights from complex- 
ity theory" %, is that the cost of this algorithmic task is exponential in 
circuit size. Indeed, simulation methods have improved steadily over the 
past few years‘? *°. We expect that lower simulation costs than reported 
here will eventually be achieved, but we also expect that they will be 
consistently outpaced by hardware improvements on larger quantum 
processors. 


Verifying the digital error model 


A key assumption underlying the theory of quantum error correction 
is that quantum state errors may be considered digitized and local- 
ized***", Under sucha digital model, all errors in the evolving quantum 
state may be characterized by a set of localized Pauli errors (bit-flips or 
phase-flips) interspersed into the circuit. Since continuous amplitudes 
are fundamental to quantum mechanics, it needs to be tested whether 
errors ina quantum system could be treated as discrete and probabil- 
istic. Indeed, our experimental observations support the validity of 
this model for our processor. Our system fidelity is well predicted bya 
simple model in which the individually characterized fidelities of each 
gate are multiplied together (Fig. 4). 

To be successfully described by a digitized error model, asystem 
should be lowin correlated errors. We achieve this in our experiment by 


Number of cycles, m 


complexity, justifies the use of elided circuits to estimate fidelity in the 
supremacy regime. b, Estimating Fy,, in the quantum supremacy regime. Here, 
the two-qubit gates are applied in anon-simplifiable tiling and sequence for 
whichit is much harder to simulate. For the largest elided data (n=53,m=20, 
total N,=30 million), we find an average Fy¢p > 0.1% with So confidence, wherea 
includes both systematic and statistical uncertainties. The corresponding full 
circuit data, not simulated but archived, is expected to show similarly 
statistically significant fidelity. For m= 20, obtaining a million samples on the 
quantum processor takes 200 seconds, whereas an equal-fidelity classical 
sampling would take 10,000 years ona million cores, and verifying the fidelity 
would take millions of years. 


choosing circuits that randomize and decorrelate errors, by optimizing 
control to minimize systematic errors and leakage, and by designing 
gates that operate much faster than correlated noise sources, such as 
1/fflux noise*”. Demonstrating a predictive uncorrelated error model 
up toa Hilbert space of size 2° shows that we can build a system where 
quantum resources, suchas entanglement, are not prohibitively fragile. 


The future 


Quantum processors based on superconducting qubits can now perform 
computations in a Hilbert space of dimension 2° = 9 x 10%, beyond the 
reach of the fastest classical supercomputers available today. To our 
knowledge, this experiment marks the first computation that can be 
performed only ona quantum processor. Quantum processors have 
thus reached the regime of quantum supremacy. We expect that their 
computational power will continue to grow at a double-exponential 
rate: the classical cost of simulating a quantum circuit increases expo- 
nentially with computational volume, and hardware improvements will 
probably follow a quantum-processor equivalent of Moore’s law’, 
doubling this computational volume every few years. To sustain the 
double-exponential growth rate and to eventually offer the computa- 
tional volume needed to run well known quantum algorithms, such as 
the Shor or Grover algorithms”, the engineering of quantum error 
correction will need to become a focus of attention. 

The extended Church-Turing thesis formulated by Bernstein and 
Vazirani> asserts that any ‘reasonable’ model of computation can be 
efficiently simulated by a Turing machine. Our experiment suggests 
that a model of computation may now be available that violates this 
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assertion. We have performed random quantum circuit sampling in 
polynomial time using a physically realizable quantum processor (with 
sufficiently low error rates), yet no efficient method is known to exist for 
classical computing machinery. As a result of these developments, quan- 
tum computing is transitioning from a research topic to a technology 
that unlocks new computational capabilities. We are only one creative 
algorithm away from valuable near-term applications. 


Data availability 
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public Dryad repository (https://doi.org/10.5061/dryad.k6t1rjg). 
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Over the past few decades, several molecular cages, hosts and nanoporous materials 
enclosing nanometre-sized cavities have been reported’ *, including coordination- 
driven nanocages*. Such nanocages have found widespread use in molecular 
recognition, separation, stabilization and the promotion of unusual chemical 
reactions, among other applications* °. Most of the reported nanospaces within 
molecular hosts are confined by aromatic walls, the properties of which help to 
determine the host-guest behaviour. However, cages with nanospaces surrounded by 
antiaromatic walls have not yet been developed, owing to the instability of 
antiaromatic compounds; as such, the effect of antiaromatic walls on the properties of 
nanospaces remains unknown. Here we demonstrate the construction of an 
antiaromatic-walled nanospace within a self-assembled cage composed of four metal 
ions with six identical antiaromatic walls. Calculations indicate that the magnetic 
effects of the antiaromatic moieties surrounding this nanospace reinforce each other. 
This prediction is confirmed by'H nuclear magnetic resonance (NMR) signals of bound 


guest molecules, which are observed at chemical shift values of up to 24 parts per 
million (ppm), owing to the combined antiaromatic deshielding effect of the 
surrounding rings. This value, shifted 15 ppm from that of the free guest, is the largest 
'H NMR chemical shift displacement resulting from an antiaromatic environment 
observed so far. This cage may thus be considered as a type of NMR shift reagent, 
moving guest signals well beyond the usual NMR frequency range and opening the way 
to further probing the effects of an antiaromatic environment ona nanospace. 


Aromaticity and antiaromaticity are fundamental conceptsin chemistry, 
and a long-standing challenge is the preparation of antiaromatic mol- 
ecules and the study of their properties. Most cavities within coordina- 
tion cages can be considered as ‘aromatic-walled nanospaces’ owing to 
the aromatic character of the surrounding walls® °° (Fig. 1a). Aromatic- 
walled nanospaces are characterized by an intermolecular aromatic 
NMR-shielding effect, in which the nuclei of included guests experiencea 
weaker magnetic field than the one applied”, with endohedral fullerenes” 
providing a canonical example of such NMR effects. By contrast, acavity 
surrounded by antiaromatic walls would experience deshielding (Fig. 1b), 
or enhancement of the external magnetic field, because an antiaromatic 
ring generates an induced magnetic field in the opposite direction to that 
ofan aromatic ring. The creation of an antiaromatic-walled nanospace is 
achallenging task because it requires the precise placement of unstable 
antiaromatic walls around the central cavity. The instability of these walls, 
reflected ina high degree of chemical reactivity, isa consequence of the 
electronic structure of antiaromatic molecules”. 

Antiaromatic compounds, which havea cyclic and planar 1-conjugated 
system with 4n tt electrons, are generally quite reactive: electrons are 
relatively easy both to add and to remove, and such compounds may 
react as diradicals”. To utilize antiaromatic rings as building blocks 
for antiaromatic-walled nanospaces, both high stability and strong 
antiaromaticity are required—a set of properties that initially seem 


mutually incompatible. After investigating different preparation strate- 
gies that have been reported recently for antiaromatic systems ”°, we 
came upon the work of Shinokubo and co-workers”, who reported the 
facile synthesis of an antiaromatic porphyrinoid with 16 1 electrons, 
Ni"-dimesitylnorcorrole (1; Extended Data Fig. 1). Despite the strong 
antiaromaticity over the entire surface of the central moiety of 1, it is 
stable under ambient conditions. This unusual stability permitted access 
to functionalization of 1 by several chemical reactions” *°. We thus set 
out to use Las a building block for an antiaromatic-walled nanospace 
through subcomponent self-assembly”. 

Di(aniline)-based subcomponent 2 was synthesized from Lin three 
steps on the basis of published methods””¢ (Extended Data Fig. 1). 
Antiaromatic cage 3 was then constructed using subcomponent self- 
assembly. Diamine 2 (6 equiv.), 2-formylpyridine (12 equiv.) and Fe” 
bis(trifluoromethanesulfonyl)imide (NTf, ) (4 equiv.) were mixed 
in CH,CN, resulting in the formation of Fe",L, cage 3 as the uniquely 
observed product (Fig. 2a). The structure of 3 was characterized by NMR 
spectroscopy, mass spectrometry (MS) and X-ray crystallography. All 
of the 'H NMR signals were assigned using different NMR techniques 
(Supplementary Figs. 10-21). A set of signals for norcorrole moieties 
(Hg--) was observed at 1.76-2.02 ppm, in the same high-field region as 
land 2, indicating that the antiaromaticity of the norcorrole skeleton 
was retained following assembly into cage 3 (Fig. 2c, Supplementary 
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Fig. 1| Cartoon representations of nanospaces. a, b, An aromatic-walled 
nanospace surrounded by aromatic walls (a) and an antiaromatic-walled 
nanospace with antiaromatic walls (b). B, is the applied magnetic field. 


Fig. 10). The signals of the bulky mesityl groups (H,,.), which cannot freely 
rotate, split in two, corresponding to environments inside and outside 
the cage. The phenylene signals (H, ;,) were considerably broadened at 
room temperature. We thus infer that phenylene rotation is restricted 
by the nearby mesityl and pyridine moieties. However, a set of four sharp 
phenylene doublets was observed at 243 K, which correspond to pro- 
tons inside and outside the cage, as with the mesityl signals (Fig. 2d, 
Supplementary Fig. 11). The diffusion-ordered spectroscopy (DOSY) 
NMR spectrum shows a single band with a diffusion coefficient (D) of 
3.98 x 10° m? s7, which corresponds to a diameter of 3 of about 3nm 


He Fe(NTE), 


(4 equiv.) 


(12 equiv.) 
—_> 
r.t., overnight 

CH,CN 
95% 


(Supplementary Fig. 21). Prominent signals of the antiaromatic cage 
were observed under standard electrospray ionization time-of-flight 
(ESI-TOF) MS conditions (Supplementary Fig. 22). 

X-ray crystallographic analysis provided unambiguous evidence for 
the formation of the Fe'lL , structure of 3. Dark-red single crystals of 3 
were obtained by slow diffusion of Et,O into a solution of 3 in CH,CN. 
Six ligands bridge four octahedral Fe” centres to provide the expected 
tetrahedral cage with Tsymmetry (Fig. 3a, Supplementary Figs. 23, 24, 
Supplementary Table 1), with four apertures of ~3.3 A (between proximal 
H,-H, on adjacent norcorrole units) on the faces. All Fe” centres in each 
tetrahedron have the same handedness (AAAA or AAAA). The metal- 
metal distances are 21.9 A for Fe Fe and 14.6 A between Ni“Niantipodes. 
Each norcorrole wall displays a 165.4(3)° (uncertainties are 1o) bend 
inwards (measured as the C3-Ni-C12 angles; Supplementary Fig. 25). 
Although norcorrole1and previously reported 3,12-substituted norcor- 
roles” are planar (180.0°), the ditopic norcorrole walls of 3 bow inwards. 
Asaresult of this bending, the face apertures are minimized via stacking 
between mesityl groups and the neighbouring norcorrole edges inthe 
crystal. This conformation is also present in solution, as indicated by 
nuclear Overhauser effect spectroscopy (NOESY), where correlations 
are observed between the mesityl and norcorrole moieties (Hi Hy 
and Hy'-Hg -) of 3 (Supplementary Figs. 17, 26). The cavity volume for 
the X-ray crystal structure was estimated using the PLATON program 
tobe 1,150 A? (Supplementary Fig. 24), substantially less than the 1,950 A? 


“| 8:NTE 


Cout Cin f a 


6 (ppm) 


Fig. 2| Synthesis and NMR characterization of 3. a, Subcomponent self-assembly of antiaromatic cage 3. b-d, 'H NMR spectra (500 MHz) of subcomponent 2 in 


CDCI, at 298 K (b) and 3 in CD,CN at 298 K (c) and 243 K (d). r.t., room temperature. 
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Fig. 3| Crystal structure of 3 and NICS calculations. a, X-ray crystal structure 
of 3 (AAAA enantiomer) in stick representation with a three-dimensional NICS 
grid, showing the magnetic deshielding experienced within the antiaromatic- 
walled nanospace (displayed only for NICS,,, > 3; a front layer was sliced off to 
show the inside region. A more complete view of this three-dimensional NICS 
grid is shown inthe Supplementary Video). b,c, Cross-sections of bis(Fe") model 
complex 3’ (b) and 3 (c), obtained from two-dimensional NICS calculations. 

d,e, NICS slice plots of 3’ (d) and3(e) ona 0.25-A-resolution grid, calculated 
using the crystal structure at the B3LYP/SDD (for Ni, Fe) and 6-31G(d) (for C, N, H) 
levels. Red and blue represent deshielding (positive) and shielding (negative) 
zones, respectively. The chemical structure is overlaid in white as a visual aid. 


volume of a model having planar norcorrole walls (Supplementary 
Fig. 27). 

The ultraviolet—visible-near-infrared (UV-vis—NIR) spectra of 3 and2 
are shown in Supplementary Fig. 28. Subcomponent 2 displayed broad 
absorption bands around 600 nm and 1,000 nm. The former is assigned 
to intramolecular charge transfer from HOMO - 4 to the LUMO”, and 
the latter is a characteristic band for antiaromatic porphyrinoids”. 
These peaks were substantially broadened following cage formation 
inthe spectrum of 3. These results are consistent with time-dependent 
density functional theory (TD-DFT) calculations of the spectra of 2 and 
3’. Complex 3’ is acomputable model of a single edge of 3: a bis(Fe") 
complex, in which each iron centre is bound to one edge of a 2’ ligand 
and two (E)-N-phenyl-(pyridin-2-yl)methanimine. 


We carried out cyclic voltammetry experiments (Supplementary 
Fig. 29) to investigate the electrochemical properties of 2and 3. Whereas 
subcomponent 2 displayed three reversible reduction peaks and one 
irreversible oxidation peak, attributed to oxidation of the primary 
amines, cage 3 gave amore complex collection of oxidation and reduc- 
tion waves. The reduction peak at —2.29 V was attributed to the Fe"- 
coordinated pyridylimines”’, The multiple overlapping waves from -2.0 
to +0.5 V probably arose from electrochemical communication among 
the norcorrole walls of 3. The calculated HOMO-LUMO gaps of 1, 2 and 
3’ are the same (that is, 1.5 eV; Supplementary Fig. 30), whichis consist- 
ent with 1 and 2 having the same gap between their first reduction and 
oxidation (Supplementary Fig. 29). 

To investigate the extent of the antiaromaticity experienced within 
the void volume of 3, we carried out nucleus-independent chemical 
shift (NICS) calculations. Calculated NICS(O) values of the norcorrole 
moieties are givenin Supplementary Fig. 31. For the norcorrole walls in1, 
2and 3, the large positive values at the centre of each ring that includes 
the nickel atom are consistent with strong cavity-wall antiaromatic- 
ity. This result prompted us to seek further insight into the environ- 
ment of the central nanospace. The two-dimensional NICS,,, plot of 3 
orthogonal to the ring revealed an enhanced antiaromaticity-induced 
magnetic field within the cavity in comparison with model complex 3’ 
(Fig. 3b-e). The graphic (Fig. 3a) and animation (Supplementary Video) 
provide a three-dimensional visualization of this field, as calculated 
using a three-dimensional NICS,,, grid. The calculated NICS(n) at the 
centroid of 3 is 6 = +7.4 ppm, which is approximately six times larger 
than the NICS(n) value of a corresponding point at the same distance 
from 3’ (Supplementary Fig. 32a, b). This result indicates that the six 
norcorrole walls of 3 have an additive effect on the antiaromaticity influ- 
ence of the central environment. Furthermore, positive NICS(n) values 
around the centroid within 3 were consistently high, whereas those at 
the same coordinates of 3’ decreased as the distance increased from the 
Nicentre (Supplementary Fig. 32c). These computational studies thus 
support cooperativity between the antiaromatic walls in increasing the 
antiaromatic character within 3. The anisotropy of the induced current 
density (ACID) of 1,2 and 3, which traces out ring currents, also supports 
the results of NICS (Supplementary Fig. 33). 

Host-guest studies were conducted to investigate experimentally 
the effect of guest binding within the antiaromatic-walled cavity of 3. 
When 3 (1.0 mg, 0.12 pmol) and coronene (4; 5 equiv.) were mixed in 
CD,CN, formation of the 1:2 host-guest complex 3°(4), was observed by 
MS (Supplementary Fig. 37) and NMR (Fig. 4a) analyses. In the ‘'H NMR 
spectrum of 3-(4),, the host signals were observed at similar chemical 
shift values to those of the empty host. Remarkably, however, the encap- 
sulated guest signal was shifted downfield by 8.1 ppm compared tothe 
free guest (Fig. 4c, Supplementary Fig. 37), as aresult of the antiaromatic 
deshielding effect from the surrounding norcorrole walls. This down- 
field signal was observed to diffuse at the same rate as 3 in the DOSY 
spectrum (Supplementary Fig. 36). A heteronuclear single quantum 
coherence (HSQC) spectrum allowed this new signal to be unambigu- 
ously assigned to the encapsulated coronenes (Supplementary Fig. 35). 

Similarly, the treatment of corannulene (5), dibenzo(g,p)chrysene (6), 
truxene (7), carbon nanobelt (8)”’ and N-methylfulleropyrrolidine (9) 
with 3 in acetonitrile gave rise to 1:2 or 1:1 host-guest complexes, prob- 
ably driven by a combination of dispersion forces, aromatic stacking, 
CH-t interactions and solvophobic effects (Fig. 4b, Supplementary 
Figs. 38-60). The polycyclic aromatic hydrocarbons 6-9, which have low 
solubility in acetonitrile, all showed encapsulation after mixing a solution 
of3 withan excess of solid guest at room temperature. The signals of all 
encapsulated guest molecules showed substantial downfield shifting 
within the antiaromatic-walled nanospace. The two encapsulated mol- 
ecules of D,-symmetric 6 displayed eight peaks in the range 9-22 ppm 
(A6 = +1.7-13.4 ppm; Fig. 4d, Supplementary Figs. 42-46). The signals 
of the aromatic and aliphatic moieties in C,,-symmetric 7 appeared 
in the downfield region at 8-19 ppm (A6é = +0.7-13.6 ppm; Fig. 4e, 
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Fig. 4 | Host-guest chemistry within antiaromatic-walled nanospace.a, 
Encapsulation of 4and an MM3-optimized structure of 3-(4), based on the 
crystal structure of 3. b, Molecules observed to bind within 3. c-f, Partial 'H NMR 
spectra (SOO MHz, CD,CN, 298 K) of 3:(4), (c), 3*(6), (d), 3°(7), (e) and 3:8 (f), 
showing the signals for the encapsulated guests in the downfield region (red 


Supplementary Figs. 47-52). Two sets of guest signals were observed 
in the case of 7. These signals are attributed to diastereomeric dimeric 
guest configurations within the chiral cavity of 3. The most extreme 
downfield shifting was observed for a carbon nanobelt (8). The signals 
of the bay hydrogen atoms appeared at 23 ppm, shifted by +14.9 ppm 
compared with free 8, whereas the outer hydrogen signals were shifted 
by only +6.5 ppm (Fig. 4f, Supplementary Figs. 53-55). Among the set of 
guests encapsulated (Fig. 4b), the extent of the shift (A6) varies from 0.7 
to14.9 ppm (Supplementary Fig. 71), depending on the guest and proton 
position. To obtain further insight into the antiaromaticity-influenced 
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circles, aromatic signals; green triangles, aliphatic signals). g,Co-encapsulation 
of5and 10, starting from 3-(10); with MM3-optimized structures (the front 
norcorrole walls are transparent for clarity). h, ‘1H NMR spectra (500 MHz, 
CD,CN, 298 K) of 3-(10), (top) and 3-(5:10-5) (bottom). Red circle, central 10; grey 
square, outer 10; yellowtriangle, outer 5. 


environment within 3, we conducted NICS calculations at various key 
points in this nanospace (Supplementary Fig. 32a, b).In contrast to the 
positive NICS values around the norcorrole walls, the vertex and aperture 
sites show negative values, which indicate shielding caused by aromatic- 
ity. Consequently, protons that are localized near vertices and apertures 
show relatively minor downfield shifting (Supplementary Figs. 70, 72). 
The overall association constant £, for the 1:2 host-guest complex 3-(5), 
was estimated to be 4.3 x 10° M * (+45%) in CD,CN at 263 K (Supplemen- 
tary Figs. 73, 74). The association constants for the other guests could 
not be determined because of their low solubility in acetonitrile. 


When 3and tetraoxa[8]circulene (10)*° were mixed in acetonitrile, the 


1:3 host-guest complex 3-(10); was formed, without notable effect on 
guest relaxation times (Fig. 4g, h, Supplementary Figs. 61-67), result- 
ing in full occupation of the antiaromatic-walled nanospace. Within the 
three-guest stack, the central and outer molecules of 10 are influenced 
differently by the deshielding effects of the surrounding norcorroles. 
Whereas the central guest displayed a Aé shift of +12.8 ppm, the outer 
ones only shifted by +2.8 ppm. 

Aremarkable hetero-guest ‘sandwich’, 3-(5°10-5), with one circulene 
intercalated between two corannulenes, was observed to form selec- 
tively when 5 (40 equiv.) was added to an acetonitrile solution of 3-(10),, 
as confirmed by NMR and electrospray ionization mass spectrometry 
(ESI-MS) data (Fig. 4g, h, Supplementary Figs. 68-69). Interestingly, 
the outer corannulenes show large downfield shifts (A6=+6.3 ppm), 
similar to 3°(5), (Ad = +6.6 ppm), in contrast to the smaller downfield 
shifts of the outer molecules of 10 in 3-(10), (Aé = +2.8 ppm). The cen- 
tral 10 experienced a similar degree of antiaromatic ring current as 
3-(10), (A6 = +12.0 ppm versus +12.8 ppm, respectively). We infer that 
the smaller, concave nature of 5led toa positioning of its protons within 
3-(5-10°5), which induced a stronger antiaromatic deshielding effect 
compared to the outer equivalents of 10 in 3-(10), (Fig. 4h). 

Cage 3 thus serves as a kind of NMR shift reagent that acts without 
notable shortening of the nuclear relaxation times of guest species, 
as opposed to paramagnetic NMR shift reagents. This result confirms 
theoretical predictions of intermolecular effects involving multiple 
antiaromatic molecules. Future work will explore chemical reactivity 
within this kind of nanospace. 
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Methods 


Synthesis of 3,12-di(4-aminophenyl) Ni"-dimesitylnorcorrole (2) 
To a100-ml two-necked glass flask, a mixture of di(4-nitrophenyl) Ni"- 
dimesitylnorcorrole 1b (45.5 mg, 0.0555 mmol), SnCl,-2H,O (1.25 g, 
5.55 mmol) and dry THF (40 ml) was added under N.. The resulting solu- 
tion was stirred at 70 °C overnight. The mixture was poured in AcOEt 
(100 ml) and washed with saturated NaHCO, (aq.), water and brine. 
The organic phase was dried over MgSO,, filtrated and concentrated 
under reduced pressure. The crude product was purified by prepara- 
tive TLC onasilica gel plate with CH,Cl, to afford 2 as a dark brown solid 
(19.4 mg, 46%). 

1H NMR (500 MHz, CDCL,, 298 K): 61.73 (s, 6H, para-Mes), 1.74 (d, 

J=4.0 Hz, 2H, B-CH), 1.86 (s, 2H, B-CH), 1.90 (d,/=4.0 Hz, 2H, B-CH), 2.72 
(s,12H, ortho-Mes), 3.19 (br, 4H, NH,), 5.30 (d,J=8.5 Hz, 4H, Ph), 5.59 (d, 
J=8.5 Hz, 4H, Ph), 5.99 (s, 4H, Mes). 

BC NMR (125 MHz, CDCL,, 298 K): 618.0 (CH;), 20.6 (CH), 112.3 (CH), 
113.5 (CH), 115.9 (CH), 122.2 (C,), 125.5 (CH), 125.8 (CH), 126.7 (C,), 127.7 
(CH), 133.8 (C,), 136.7 (C,), 144.4 (C,), 145.3 (C,), 147.7 (C,), 153.6 (C,), 
156.2 (C,), 160.3 (C,),171.1(C,). 

HR MS (ESI-TOF, CH,Cl,) m/z: [MJ calcd for CygHyoN,Ni, 758.2662; 
found 758.2648. 


Formation of antiaromatic-walled nanospace3 
Subcomponent 2 (50.0 mg, 0.0658 mmol), Fe(NTf;),-4.5H,O (30.4 mg, 
0.0436 mmol), 2-formylpyridine (14.0 mg, 0.131 mmol) and CH;CN 
(25 ml) were added to a 50-ml round-bottom flask, and the mixture 
was Stirred at room temperature overnight. The dark-red solution 
was concentrated to 5 ml under reduced pressure and poured in Et,O 
(50 ml). The resulting solid was collected by centrifugation, washed 
with additional Et,0 and then dried to give 3 as a dark-red solid (83.7 mg, 
0.0103 mmol, 95%). 
1H NMR (500 MHz, CD,CN, 298 K): 61.28 (s, 36H, para-Mes), 1.76 (d, 
J=4.0 Hz, 12H, B-CH), 1.86 (s, 12H, B-CH), 2.02 (d,J=4.0 Hz, 12H, B-CH), 
2.15 (br, 12H, Ph), 2.36 (s, 36H, ortho-Mes), 3.15 (Ss, 36H, ortho-Mes), 3.78- 
5.00 (br, 24H, Ph), 4.77 (s, 12H, Mes), 5.27-6.43 (br, 12H, Ph), 5.51(s, 12H, 
Mes), 6.82 (d,J'=5.5 Hz, 12H, py), 7.30 (s, 12H, imine), 7.46 (dd,J=6.5, 
5.5 Hz, 12H, py), 7.98 (d,J=7.5 Hz, 12H, py), 8.12 (dd,J=7.5, 6.5 Hz, 12H, py). 
'H NMR (500 MHz, CD,CN, 243 K): 61.28 (s, 36H, para-Mes), 1.67 (d, 
J=4.0 Hz, 12H, B-CH), 1.78 (s, 12H, B-CH), 1.93 (d,J=4.0 Hz, 12H, B-CH), 
2.06 (d,/=8.0 Hz, 12H, Ph), 2.23 (s, 36H, ortho-Mes), 3.15 (s, 36H, ortho- 
Mes), 4.24 (d,/=8.0 Hz, 12H, Ph), 4.34 (d,/=8.0 Hz, 12H, Ph), 4.73 (s, 12H, 
Mes), 5.45 (Ss, 12H, Mes), 5.81 (d,/=8.0 Hz, 12H, Ph), 6.80 (d,/=5.5 Hz, 12H, 
py), 7.25(s, 12H, imine), 7.45 (dd,/=6.5, 5.5 Hz, 12H, py), 7.99 (d,J=7.5Hz, 
12H, py), 8.13 (dd,J=7.5, 6.5 Hz, 12H, py). 

BC NMR (125 MHz, CDCL,, 298 K): 617.2 (CH;), 18.8 (CH), 20.2 (CH), 
116.9 (CH), 116.9 (CH), 117.1 (CH), 119.7 (CH), 122.2 (C,), 124.8 (CH), 126.5 


(CH), 127.9 (CH), 128.2 (CH), 130.3 (CH), 131.7 (CH), 132.0 (CH), 132.7 (C,), 
133.7 (C,), 134.8 (C,), 137.5 (C,), 140.4 (CH), 146.2 (C,),147.7 (C,),148.5 (C,), 
149.6 (C,), 156.2 (CH), 158.1 (C,), 159.5 (C,), 168.3 (C,), 169.7 (C,), 173.4 (CH). 

5 NMR (125 MHz, CDCI;, 243 K): 616.7 (CH;), 18.5 (CH;), 19.8 (CH), 
116.4 (CH), 116.6 (CH), 117.0 (CH), 119.1 (CH), 121.7 (C,), 123.4 (CH), 125.9 
(CH), 127.5 (CH), 127.6 (CH), 129.8 (CH), 131.4 (CH), 131.6 (CH), 131.9 (C,), 
133.0 (C,), 134.4 (C,),137.1 (C,), 139.9 (CH), 145.9 (C,), 147.3 (C,), 147.9 (C,), 
149.2(C,), 155.8 (CH), 157.5 (C,),159.3 (C,), 168.1 (C,), 169.6 (C,), 173.2 (CH). 

°F NMR (471 MHz, CD,CN, 298 K): 6 -80.5 (s, CF;). 

ESI-TOF MS (CH,CN): m/z1742.3 [3 - 4NTF, ]*, 1337.9 [3 - SNTFy 1, 
1068.3 [3 - 6NTF,]*, 875.7 [3 - NTF, 1, 731.2 [3 - SNTF, I. 


Host-guest chemistry 

Cage 3 (1.0 mg, 0.12 pmol), polyaromatic guest (5 or 40 equiv.) and 
CD,CN (0.5 ml) were added to a 2-ml glass vial. The mixture was soni- 
cated for 30 s and stirred at room temperature for 1h. The formation 
of ahost-guest complex was confirmed by NMR and ESI-MS analyses. 
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Extended Data Fig. 1| Preparation of diamine 2 from norcorrole1.When1 was 
treated with1,3-dibromo-5,5-dimethylhydantoin, inseparable mono-, di- and 
tribrominated products were obtained. As bromination randomly occurred at 
positions 3, 7,12 and 16 (that is, adjacent to the mesityl moieties), the 3,12- and 
3,16-substituted dibromonorcorroles trans-la and cis-1a were obtained as the 
main products (Supplementary Figs. 1, 2). Subsequently, disubstituted trans-1b 
and cis-1b were obtained by Suzuki-Miyaura cross-coupling at 43% yield asa 
mixture of two regioisomers (Supplementary Figs. 3, 4). Finally, 3,12-substituted 


trans-1b 


other brominated 
species 


(ii) 


NO, cis-1b 


subcomponent 2 could be isolated as a single regioisomer at 46% yield following 
reduction of the NO, groups and chromatography (Supplementary Figs. 5-9). 
Reagents and conditions: (i) 1,3-dibromo-5,5-dimethylhydantoin, CH,Cl,,-78 °C, 
1h, 94% (mixture of trans-1a, cis-1aand other brominated species). (ii) 
4-Nitrophenylboronic acid pinacol ester, Pd(PPh3)4, K3PO,, dry THF, 70 °C, 1h, 
43% (mixture of trans-1b and cis-1b). (iii) SnCl,-H,O, dry THF, 70 °C, overnight, 
46% (isolated 2). 


Article 


Site-specific allylic C-H bond 
functionalization with a copper-bound 


N-centred radical 


https://doi.org/10.1038/s41586-019-1655-8 


erica 
Received: 27 April 2019 Guosheng Liu 


Accepted: 14 August 2019 


Jiayuan Li’, Zhihan Zhang?, Lianqian Wu", Wen Zhang, Pinhong Chen’, Zhenyang Lin”* & 


Published online: 23 October 2019 


Methods for selective C-H bond functionalization have provided chemists with 


versatile and powerful toolboxes for synthesis, such as the late-stage modification of a 
lead compound without the need for lengthy de novo synthesis! °. Cleavage of an sp® 
C-H bond via hydrogen atom transfer (HAT) is particularly useful, given the large 
number of available HAT acceptors and the diversity of reaction pathways available to 
the resulting radical intermediate® ”. Site-selectivity, however, remains a formidable 
challenge, especially among sp? C-H bonds with comparable properties. If the 
intermediate radical could be further trapped enantioselectively, this should enable 
highly site- and enantioselective functionalization of C-H bonds. Here we reporta 
copper (Cu)-catalysed site- and enantioselective allylic C-H cyanation of complex 
alkenes, in which a Cu(11)-bound nitrogen (N)-centred radical plays the key role in 
achieving precise site-specific HAT. This method is shown to be effective for a diverse 
collection of alkene-containing molecules, including sterically demanding structures 
and complex natural products and pharmaceuticals. 


To achieve site-selective hydrogen atom abstraction, strategies typi- 
cally rely on substrate control, whereby C-H bonds with different bond 
dissociation energies, polarities, steric or electronic factors dictate 
the site-selectivity® *” “. Organic molecules often contain multiple 
sp’ C-H bonds having comparable properties, and therefore exhibit 
undistinguishable reactivity with reagents capable of promoting HAT 
pathways (small AAG*, Fig. 1a)"°". Our goal was to increase the intrinsic 
reactivity differences among different C-H bonds, thereby enabling 
HAT with enhanced site-selectivity. To this end, we show that a Cu(II)- 
bound N-centred radical (NCR), with a modular sulfonamide moiety 
and bidentate ligand (Fig. 1b), acts as a tunable HAT reagent capable 
of amplifying the site-selectivity among similar allylic C-H bonds in 
complex molecules (Fig. 1c). Subsequent regio-, stereo- and enanti- 
oselective capture of the allylic radical by chiral Cu(11)-cyanide species 
leads to highly selective allylic C-H cyanation (Fig. 1c). This strategy 
provides a powerful tool for late-stage functionalization of complex 
alkene-containing molecules, including natural products and drugs. 
Alkenes represent an important class of functional groups in fine 
chemicals, natural products, pharmaceuticals and organic materials. A 
long-sought-after organic transformation by synthetic chemists is thus 
the asymmetric functionalization of allylic C-H bonds, which would 
grant access to highly valuable, optically pure olefinic compounds from 
readily available alkenes*”’. Of particular importance is late-stage func- 
tionalization of olefin-containing bioactive molecules, which would 
provide a straightforward way to efficiently construct drug candidate 
libraries. However, after nearly 60 years of development, asymmetric 
Kharasch-Sosnovsky reactions” ” still suffer from limited substrate 


scope (only simple cyclic alkenes) and the requirement of excess alkenes, 
not to mention challenges of site-selectivity (Fig. Ic). 

We recently developed a Cu-catalysed radical relay as an effective 
approach for the asymmetric cyanation of styrenes” and benzylic C-H 
bonds”. The benzylic radical selectively captured by a chiral Cu(II) cya- 
nide is the key step to constructing a C-CN bond with excellent enan- 
tiomeric excess. We were delighted to find that the allylic radical can 
also be regioselectively and enantioselectively trapped by chiral Cu(11) 
cyanide to deliver allylic cyanation products (see Supplementary Fig. 1). 

To survey the possible site-selective HAT, the allylic cyanation of 
trisubstituted alkenes 1a bearing two sets of allylic hydrogens was per- 
formed (Fig. 2a). A moderate site-selectivity (C3:C7 =5:1) was obtained 
with N-fluoroalkylsulfonamide (NFAS) 2a to give 3a at 13% yield, where 
NFAS acts as the precursor of structurally diverse NCRs. Varying substitu- 
ents on NFAS greatly increased the selectivity and efficiency of the reac- 
tion. For instance, both site-selectivity and yield were greatly improved 
by introducing a bulky alkyl group on the nitrogen atom (from 2a at 13% 
yield, C3:C7 =5:1to 2d at 67% yield, C3:C7 =17:1). The site-selectivity was 
further improved by introducing an electron-withdrawing aryl group 
(2e, C3:C7 = 22:1). Pleasingly, all of the reactions (Fig. 2a) produced the 
stereoisomer 3a with excellent enantiomeric excess values, indicating 
that the allylic radical generated was captured by a chiral Cu(II) cyanide 
ina highly regio- and enantioselective manner. Similar enantiomeric 
excess values (89-91%) were obtained with different NCR precursors 
2a to 2e, confirming that the enantioselective radical trapping process 
by the Cu species was independent of the hydrogen atom abstraction 
step (Fig. Ic). 
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Fig. 1|Site- and enantioselective oxidation of sp® C-H bonds. a, HAT of 
comparable C-H bonds and the corresponding energy diagram. b, Proposed 


More importantly, the choice of ligand had a remarkable impact on 
the site-selectivity in that a lower site-selectivity was observed with 
ligand L2 compared to LI. As shown in Fig. 2a, the site-selectivity (3a, 
C3:C7) was dramatically decreased from 22:1 to 5:1. The influence by 
both NFAS and the ligand on the site-selectivity (C3:C7) suggests that 
the HAT process is not simply controlled by a free NCR. An electrophilic 
N-xanthylamide reagent has recently been used to enable sp? C-H xan- 
thylation”, where free amidyl radicals were involved in the removal of 
hydrogen atoms”, Inspired by this chemistry, we applied the similar 
reagent 4, which contains the same sulfonamide moiety as the electro- 
philic NFAS (N-F reagent) 2e, to probe the site-selectivity of the reaction 
of 1a under a free radical pathway (Fig. 2b). The reactions gave amuch 
lower site-selectivity (5a, C3:C7 =3.4:1) thana Cu-catalysed system (3a, 
C3:C7 = 22:1), highlighting the limitations of using free NCRs as HAT 
acceptors for site-selective C-H functionalization. High site-selectivity 
was also observed in the asymmetric allylic cyanation of non-conjugated 
acyclic alkene 1b (Fig. 2c), selectively abstracting allylic hydrogen at C3 to 
give 3b with good reactivity (64% yield) and site-selectivity (C3:C6:C7 = 
10:1:0.6). By contrast, the radical chain pathway gave xanthylation prod- 
uct 5b with a poor selectivity (C3:C6:C7 = 2:1:0.5). These results show 
that the NCRs involved in the allylic cyanation are likely to bind to Cu. 

The proposed Cu(1!I)-bound NCRs are also supported by anumber of 
experiments. First, the NFAS reagent 2f was employed to test for pos- 
sible asymmetric radical cyclization. The enantioselective induction 
was observed with the chiral ligand L3, giving product 6 in 92% yield 
with 15% enantiomeric excess (Fig. 3a). Second, the oxidation of (L1) 
Cu(1) with the N-F reagent 2e was monitored by electron paramagnetic 
resonance spectroscopy, which showed a Cu(II) signal (Fig. 3b (i)). An 
NCR wasalso probed by adding a radical trap 5,5-dimethyl-1-pyrroline- 
N-oxide (DMPO) 7 (Fig. 3b (ii)). 

Density functional theory calculations at the MO6 level of theory were 
performed to gain more mechanistic insights. The reaction of (L1)Cu(1) 
(CN) with the N-F reagent 2e was first calculated, showing that the reac- 
tion readily gave a Cu(1I)-bound NCR species (Fig. 1b and Fig. 3c). This 
Cu(11)-bound NCR species is more stable by 9.4 kcal mol in free energy 
than (L1)Cu'(CN)F + free NCR. For comparison, other possible species 
werealso calculated, but are less stable than the Cu(11)-bound NCR viaa 
Cu-O coordination (see Supplementary Fig. 9). The activation barriers 
for HAT of allylic C-H bonds of 1a by Cu(11)-bound NCR were then 
calculated (path b), and compared with those by the free NCR (path a). 
The results presented in Fig. 3c reveal that, upon coordination of the NCR 
to Cu(1n), the barriers associated with HAT increase from 8.1 kcal molto 
12.8 kcal mol"and from 9.6 kcal molto 15.4 kcal mol”, respectively. The 
corresponding AAG' for allylic HAT from sites C3 and C7 increases from 


Site-specific HAT 


How to increase AAG* ? 


pee and enantio- 
selective cyanation 


3 4 3 4 
eee — ee 
3 HS 
H2 ae He Pad H27 “Rt 


Cu(11)-bound NCR. c¢, Site-specific and enantioselective allylic C-H cyanation of 
multi-substituted alkenes (this work). R, radical; L, ligand. 


1.5kcal molto 2.6 kcal mol (that is, AAAG*=1.1kcal mol)’, leading to 
an increase of nearly an order of magnitude in the site-selectivity. This 
computational outcome closely resembles the experimental results 
obtained from the reaction of la (compare C3:C7 = 3.4:1 in Fig. 2b to 
C3:C7 = 22:1 with NFAS reagent 2e in Fig. 2a, AAAG* = 1.2 kcal mol"). In 
short, O-coordination of NCR to the Cu(II) centre leads to a reactive 
radical that exhibits greater selectivity in its reaction with C-H bonds 
relative to the parent NCR. To our knowledge, this catalytic modula- 
tion of radical reactivity is unique and complements previous efforts 
to modulate reactivity by using different stoichiometric reagents”. 
Toexplore the scope of the site- and enantioselective allylic C-H cya- 
nation reaction, two NCR precursors 2d and 2e (Fig. 2a) were applied 
to a wide range of alkenes (Fig. 4a (i)). Internal alkenes—including 
vinyl(hetero)arenes (1c to le), enyne (1f), vinylboronic ester (1g) and 
allylic imide (1h)—proved to be viable substrates, providing the desired 
E-stereoisomers 3c to 3h in good yields (42-91%) with excellent regio- 
(>20:1) and enantioselectivity (87-96% enantiomeric excess). Various 
functional groups (such as phthalimide, silyl and boronic ester) and 
heterocycles (such as furan and pyridine) can be tolerated (for more 
examples, see Supplementary Fig. 2). Notably, some enantiomerically 
enriched allylic nitriles (F-3d and F-3e) were efficiently obtained from 
the mixtures of E- and Z-isomeric alkenes synthesized from the Wittig 
reaction. The practical utility of our method in the laboratory was 
demonstrated by the gram-scale preparations of 3c and 10b (Fig. 4b). 
Our attention then turned to trisubstituted alkenes bearing multiple 
sets of allylic hydrogens, which are challenging in terms of both reactivity 
and site-specificity (Fig. 4a (ii)). Excellent site-selectivity was observed 
for acyclic trisubstituted alkenes lito 1m, in which similar allylic hydro- 
gens at the C3 and C4 positions are differentiated by the Cu(1I)-bound 
NCR generated from 2e and (L1)Cu(1). Various functional groups—such 
as vinylsilane (£-1k), vinylimides (E-1I) and vinylester (Z-1m)—are com- 
patible with the standard reaction conditions, providing the correspond- 
ing allylic nitriles 3k to 3m with both excellent site-selectivity (>20:1) and 
enantioselectivity (89-95% enantiomeric excess). Notably, both isomers 
F-Akand Z-1k exhibited similar reactivity to give the same product F-3k 
in 67-72% yields with 95% enantiomeric excess, which can be applied to 
the synthesis of enantiomerically enriched internal allylic nitriles (11) 
via desilylation (Fig. 4c (i)). The selective allylic C-H cyanations were 
also successful with tri-alkyl substituted alkenes having three sets of 
allylic hydrogens (C1, C3 and C4), suchas allylic ester and allylic imide, 
to give single isomers £-3n (60% yield, 81% enantiomeric excess) and 
E-30 (52% yield, 85% enantiomeric excess). Moreover, the Cu(II)-NCR 
also shows exceptional sensitivity for Ip, with one methylene (C3) and 
two methyl groups (C6, C7) in the allylic positions (bond dissociation 
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a_ Site-selective HAT of allylic C-H bonds with Cu-bound NCRs 
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C_ Site-selective HAT of allylic C-H bonds of a non-conjugated olefin 
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Fig. 2| Enantioselective allylic cyanation of alkenes. a, Optimization of 
reaction conditions: 1a (0.2 mmol), N-F reagent 2 (0.6 mmol), trimethylsilyl 
cyanide (TMSCN, 0.7 mmol), CuOAc (5 mol%), ligands L1 and L2 (7.5 mol%) in C.F, 
(1.0 ml) at room temperature (RT; 20-25 °C) for 24 hours. Yield and site- 
selectivity ratio were determined by'H nuclear magnetic resonance (NMR) 
spectroscopy. The enantiomeric excess (e.e.) value was determined by 


energy ~83 kcal mol’) and one benzylic hydrogen (C3’, bond dissociation 
energy ~85 kcal mol’), and abstracts the allylic hydrogen at C3 selec- 
tively (C3:C6:C7:C3’ =10:1:0.6:0). Furthermore, the unconjugated diene 
1q (E:Z = 1:1) with two similar steric allylic methylene moieties showed 
good site-selectivity for the C3 position (C3:CS5 = 9:1). As examples of 
substrates with a cyclic structural motif, Ir and 1s underwent excellent 
site-specific and enantioselective cyanation (91-99% enantiomeric 
excess). The bulkier tetrasubstituted alkene It, usually inert towards 
metal catalysis, was reactive with excellent site-selectivity to afford 
product 3t in moderate yield and good enantioselectivity. These results 
showcase that the exquisite site-selectivity of HAT by a Cu(11)-bound NCR 
is governed by very subtle differences in steric environments. 

For the optically pure allylic ester (R)-£-1u, catalysts with either enan- 
tiomer of ligand L1 were employed to probe stereoselectivity. Both led 
to excellent site-selectivity (>20:1) and good to excellent diastereose- 
lectivity (81-91%), indicating high levels of catalyst-controlled rather 
than substrate-controlled stereoselectivity (Fig. 4a (iii)). The excellent 
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5b 

16% (C3:C6:C7 = 2:1:0.5)! 

(25% conv.) 
high-performance liquid chromatography (HPLC). tThe isolated yield of major 
product 3a or 3b (C3; the product with the cyano group in the carbon-3 position). 
b, Site-selective HAT by free NCR. hy, ultraviolet light. ||The isolated yield of the 
mixture of regiomers 5a or 5b. c, Site-selective HAT of alkene 1b with two distinct 
HAT acceptors. AIBN, azobisisobutyronitrile; ‘conv.’ refers to conversion of the 
substrate. 


functional group compatibility and site-specificity enabled us to synthe- 
size enantiomerically enriched compound 8, an inhibitor of dipeptidyl 
peptidase lV, via highly selective allylic C-H cyanation of alkene (S,5)-1v 
followed by deprotection of the NHBoc group (Fig. 4a (iii)). 

The radical relay method reported here is also applicable for highly 
selective late-stage functionalization of bioactive compounds, such 
as the commercial drugs altrenogest acetate, norethisterone acetate, 
genipin, brefeldin A and cyclefenil (Fig. 4b). The reaction of altrenogest 
acetate 9a, containing four different types of allylic hydrogen, exhibited 
remarkably high site-, regio- and diastereoselectivity to provide only 
one isomer 10a in 65% yield, along with the recovered 9a (22%). Nore- 
thisterone acetate 9b, which contains three different types of allylic 
hydrogens along with a terminal alkyne moiety, afforded the single 
isomer 10b selectively, with the terminal alkyne moiety intact. Diac- 
etates of genipin 9c and brefeldin A 9d, which bear three or four sets of 
allylic hydrogens along with abstractable hydrogens adjacent to ester 
groups, exhibited excellent site-specificity and diastereoselectivity, 
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Experiment: AAG,+ = 2.0 (C3:C7 = 22:1); AAG,* = 0.8 (C3:C7 = 3.4:1); AAAG# = AAG, - AAG,# = 1.2 kcal mol" 


Fig. 3 | Mechanistic studies. a, Asymmetric radical cyclization of chiral Cu(11)- 
bound NCRs. b, Electron paramagnetic resonance studies. (i) Cu(II) signal of 
Cu(11)-bound NCR; (ii) Cu(11)-bound NCR adding to DMPO (7).c, Results of 
density functional theory calculations (all values are in kilocalories per mole) at 
the Minnesota 2006 functional (M06) level of theory. TS1and TS2 are transition 


leading to single cyanation products 10c and 10d, respectively. 
Phytol, anatural product witha linear chain, was converted to ether 9e, 
which contains three sets of allylic C-H hydrogens and three tertiary 
C-Hbonds. The reaction gave allylic cyanation product 10e in 51% yield 


states. We note that (1) the Cu(11)-bound NCR species is more stable by 

9.4 kcal molt in free energy than (L1)Cu(11)(CN)F + free NCR, and (2) C3/C7 each 
has two CH bonds and only the favourable HAT processes of H'and H? are 
presented here (see Supplementary Figs. 11, 12). 


and 74% diastereomeric excess, along with 5% cyanation of the methyl 
group (C3). Unlike the Rh-carbene-promoted C-H insertion” and Fe- 
catalysed aliphatic C-H activation”, no tertiary C-H functionali- 
zation was observed here. Finally, as an example for a rather inert 
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(ii) Tri- and tetrasubstituted alkenes with two or more sets of allylic hydrogens 
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c Synthetic applications 


Fig. 4 | Substrate scopeand late-stage cyanation of drug derivatives. a, Substrate 
scope. b, Late-stage cyanation of drugs. #All prepared with (1R,2S)-L1.c, Synthetic 
applications. See Supplementary Information for experimental details; the isolated 
yield of major regiostereomer or diastereomer, and the site-selectivity was 
determined by'H NMR of the crude mixture. {The regioselectivity (3cto3h) >20:1. 


520 | Nature | Vol574 | 24 OCTOBER 2019 


7 FCN 
0 CN (i) 7H Pa/G Raney-Ni 
Bu CN BugNF/THF = Hp (1 atm.) H, (20 atm.) 
PhMe,Si OBz Bu7 “SN OBz Boc,0, 50 °C 
Et ul cl Cl 
2-38 Z-12b 
z % e.@. § 
E-3k (95% e.e.) 81%, 95% ee." 12a 91%, 99% e.e. 505, 5 6. 88%, 99% e.¢. 
(ii) (1) Standard CN Ox .NH> (iv) (1) Shi epoxidation aay 
H conditions : Na,CO,, H,0, oe CN 15a (64%8§) = Neri 
“7 : NA NPhth : it 
ww (2) Pac, mn Acetone, H,0 — pyp7> ~~ Ph <r (2)BFyOEt, Ph fs 
H, (1 atm.) (61% conv.) (95%8) OH 
dw (one-pot) 13 14 E-3c 
(EZ = 88:12) 74%, 93% e.e. 55%, 92% ee. 90% e.e. 15b 


|[The diastereomic excess (d.e.) value of the product was determined by HPLC. $Single 
isomer (diastereoselectivity d.r.>20:1) was observed incrude'H NMR. £-11:Z-11= 
10:1, §12a (d.r.=9:1),15a (d.r.=9:1), 15b (d.r.=6:1). (r.s.m. is recovered starting material.) 
Seethe Supplementary Information for details of the Shi-epoxidation and the Raney- 
Nireduction; ‘standard conditions’ refers to those used in Fig. 2a with NFAS 2e. 


tetrasubstituted olefin, adrug named cyclefenil 9f was reactive to give 
10f in 43% yield with 93% enantiomeric excess. 

Optically active nitriles are of great importance to the pharmaceuti- 
cal industry owing to their excellent biocompatibilities, resistance to 
metabolism, and abilities to accentuate biological activities through 
hydrogen-bonding interactions®. Results in Fig. 4a show that the radi- 
cal relay method reported here represents the most efficient method 
to date for the synthesis of various optically active allylic nitriles from 
readily available olefins. Additionally, these enantiomerically enriched 
nitriles can serve as synthetically valuable building blocks to optically 
active alkyl amines and carboxamides (Fig. 4c (ii) to (iv)). For instance, 
allylic nitrile Z-3s can be efficiently converted to alkyl nitrile 12a and 
homoallylic amine derivative 12b via selective hydrogenation with the 
stereocentres preserved. It should be noted that the chiral alkyl nitrile 13 
was obtained from allylic C-H cyanation of the E/Z mixture of alkenes lw 
and subsequent hydrogenation in one pot. Hydrolysis converts 13 to the 
corresponding chiral alkyl carboxamide 14 efficiently. In addition, the 
allylic nitrile 3c was subjected to epoxidation to give chiral epoxide 15a 
in good diastereoselectivity (9:1), which in turn underwent regioselec- 
tive ring-opening hydrofluorination to give the highly functionalized 
chiral nitrile 15b efficiently. 

This reaction illustrates that the Cu-based radical relay is a power- 
ful strategy for direct enantioselective functionalization of complex 
alkenes. Our method of site-specific allylic HAT by Cu(1I)-bound NCRs 
lays the groundwork for further exploration of site-specific and enan- 
tioselective C—H oxidation reactions, including those with allylic and 
other types of C-H bonds. 
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aseismic slip events” 


Faults can slip not only episodically during earthquakes but also during transient 
5, often called slow-slip events. Previous studies based on 


observations compiled from various tectonic settings® ® have suggested that the 
moment of slow-slip events is proportional to their duration, instead of following the 
duration-cubed scaling found for earthquakes’. This finding has spurred efforts to 
unravel the cause of the difference in scaling®° “. Thanks to anew catalogue of slow- 
slip events on the Cascadia megathrust based on the inversion of surface deformation 
measurements between 2007 and 2017, we find that a cubic moment-duration 
scaling law is more likely. Like regular earthquakes, slow-slip events also havea 
moment that is proportional to A*”, where A is the rupture area, and obey the 
Gutenberg-Richter relationship between frequency and magnitude. Finally, these 
slow-slip events show pulse-like ruptures similar to seismic ruptures. The scaling 
properties of slow-slip events are thus strikingly similar to those of regular 
earthquakes, suggesting that they are governed by similar dynamic properties. 


Geodetic monitoring of strain accumulation and release along various 
subduction zones has revealed episodic events of aseismic slip along 
different megathrusts’ *. These slow-slip events (SSEs) are typically 
accompanied by a burst of weak low-frequency seismic signals called 
tremors’*”. The characteristics of these slow earthquakes compiled 
from different subduction zones® suggest that their moment, M, 
(defined as the integral of slip over the fault area multiplied by the shear 
modulus), is proportional to their duration, T. It has therefore been 
inferred that SSEs, which obey My « T, and earthquakes, which obey’ 
Mp« T?, correspond to distinct modes of slip®. The cubic scaling is 
expected for circular ruptures with a constant stress drop expanding 
ataconstant rate’, akinematic model close to the dynamic circular crack 
model’’ that fits most properties of earthquakes to first order. The 
moment-duration scaling should, however, transition to My « T for the 
larger ‘bounded’ ruptures that span the full along-dip width of the seis- 
mogenic zone”. This transition is hardly seen in seismicity catalogues 
as they are dominated by smaller, unbounded events’. By contrast, 
only the larger SSEs are generally detected with geodetic techniques, 
and they generally show strongly elongated rather than circular slipped 
areas, suggesting bounded ruptures. This consideration led to the sug- 
gestion” that the different scaling between regular earthquakes and 
SSEs arises because earthquake catalogues are dominated by unbounded 
ruptures whereas SSEs mostly represent bounded ruptures. An alterna- 
tive view is that the difference of scaling between earthquakes and SSEs 
reflects fundamentally different dynamics°™. 

In this study, we take advantage of a recent catalogue of SSEs from 
Cascadia’, which was obtained from the inversion of geodetic posi- 
tion time series recorded at 352 continuous Global Positioning System 
(GPS) stations, from the Pacific Geodetic Array and the Plate Boundary 
Observatory, during the time period 2007.000 to 2017.632 (decimal year 


notation). After extracting a secular trend average through the SSEs from 
the time series, and deducing from it the pattern of coupling that repre- 
sents the degree of locking of the plate interface (Fig. 1), we corrected the 
data for hydrological effects as well as for co-seismic and post-seismic 
deformation. These corrected time series were used to image spatio- 
temporal variations of slip along the megathrust (Fig. 1). The catalogue 
of SSEs extracted from the slip model history on the whole megathrust 
contains 64 events, most of which coincided with the spatio-temporal 
distribution of tremors (Fig. 2), as was found in previous similar stud- 
ies?°7!, Individual events show unidirectional or bidirectional ruptures 
with a rupture front velocity between about 5.5 km d“ and 11kmd*? 
(ref. '°). The larger ones show pulse-like behaviour very similar to large 
earthquake ruptures” but with much lower propagation and slip rates. 
Figure 1 shows the cumulated distribution of slip resulting from all 64 
SSEs. As shown previously”’, the zone of episodic slow slip and tremors 
closely follows the intersection of the forearc Moho with the megathrust 
and is separated from the shallower locked zone by a 40-km-wide band 
of steady creep (Fig. 1). The catalogue contains SSEs with a relatively 
wide range of sizes spanning moment magnitudes, M,,, between about 
5.3 and 6.8 (Fig. 2), allowing the investigation of the scaling properties 
of a population of SSEs that all happened in a relatively narrow range 
of conditions. 

The moment and duration of the largest events of this catalogue fall 
in the slow-slip domain identified by Ide et al.° (red shading in Fig. 3b 
and in Extended Data Fig. 1), but our data do not follow the linear scaling 
proposed in that study, and align better along the My« T° scaling of 
earthquakes. This dataset, however, suffers from a bias because a low- 
pass temporal filter with a cut-off period of about 30 days was applied 
to the time series. To refine the analysis and alleviate the possibility of 
a bias introduced by the automatic picking of the onset and end of 
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Fig. 1| Comparison of interseismic coupling with cumulated slip due to 
episodic slow slip between 2007 and 2017. Data are from ref. *. The cumulated 
slip due to all of the 64 SSEs forms a band that follows the intersection of the 
forearc Moho with the megathrust and is disconnected from the shallower 
locked portion of the megathrust, whether the trench is assumed locked (as 
here) or not. Interseismic coupling, defined as the rate of slip deficit due to 
locking in the interseismic period divided by the long-term slip rate, and the 
long-term forearc motion were determined from the secular GPS velocities. 


the SSEs, we carried out manual measurements using time series 
filtered with a shorter cut-off period of approximately 9 days (see Sup- 
plementary Information and Extended Data Fig. 2 for details). As acau- 
tion, we removed 17 events that we considered questionable, and we 
combined seven pairs of events owing to their closeness in time and 
space. The final revised catalogue consists of 40 events. 

For each event, we estimate minimum and maximum durations and 
find the same trend as the original catalogue. We next use the revised 
dataset to search for the best-fitting scaling law, taking into account 
uncertainties in duration and magnitude, and the effect of the filter 
(see Methods for details). For example, we show in Fig. 3a where the 
filtered data should plotif My « T (orange filled circles) or Myx T?(green 
filled circles) were the true relationships to generate the observations. 
The root mean squared error (RMSE) forascaling law exponent (My « T°) 
of c=3is about half the value obtained for c=1and varies little for c>3. 
We also tested the robustness of our conclusion by using estimates of 
the SSE duration from the tremor bursts and the associated moment 
release based on our slip model (see Supplementary Information ‘Com- 
parison of tremor durations and SSE durations from geodesy’). In that 


case, no filter correction is needed. The results confirm that My« T?is 
much more likely thana linear scaling, with a reduction of about 68% in 
the RMSE. We conclude that SSEs occurring under a narrow range of 
conditions (for example temperature and pressure), as is the case in the 
deep SSEs from Cascadia analysed here, follow a near cubic moment- 
duration scaling like the scaling of regular earthquakes. This finding is 
all the more unexpected since most of the SSEs in our catalogue ruptured 
the entire width of the zone of episodic slow slip (Fig. 1 and Extended 
Data Fig. 3) and therefore have large aspect ratios (Fig. 4). They would 
therefore be expected to followa linear scaling”. It is noteworthy that, 
although the cubic scaling of regular earthquakes is generally thought 
to reflect self-similarity and justified on the basis of the circular crack 
model”, the same scaling is observed in our dataset in which most rup- 
tures are very elongated, with aspect ratios of 2 to 12 (Fig. 4b). 

The original catalogue, as well as our manual measurement, also 
defines a tightly constrained scaling of moment versus rupture area, 
following approximately the My « A’? scaling of regular earthquakes 
(Fig. 3f) (the best-fitting scaling law exponent is actually 1.25; see Sup- 
plementary Information for details). The ratio Myx A??? is, however, 
three orders of magnitude smaller, implying a stress drop of about 
4.3+2.0 kPa, based onthe same circular crack model generally used to 
quantify seismic ruptures? (My= CArA*2, where Aris the stress drop, 
Athe rupture area and C= 2.44), compared with 1-10 MPa for regular 
earthquakes. This estimate of the average stress drop is, however, ques- 
tionableas the rupture areas are quite elongated (Fig. 4b). We therefore 
estimated the average stress drop for each of our SSE based on our slip 
model using the approach of Nodaetal.” and using Meade’s analytical 
solution” for triangular sub-faults. The values range between 0.9 kPa 
and 18.0 kPa, with a mean of approximately 5.8 kPa and a standard 
deviation of 2.0 kPa. Our average stress drop is about 10 times lower 
than the value proposed by Schmidt and Gao” based on the slip model 
of 16 events with M,, in the range of 6.2-6.7 between 1998 and 2008. 
Given that the slip distributions are similar (the three common events 
are compared in Extended Data Fig. 4 and in the Supplementary Infor- 
mation), we suspect that this difference is due to the way that rupture 
area was measured by Schmidt and Gao”, the fact that our slip models 
do not account for the slip that would be needed to balance interseismic 
loading during aSSE, and the possibility that our slip models are spatially 
smoother because of stronger regularization of our inversion. 

We also examined the frequency-magnitude scaling of the SSEs 
(Fig. 4a). We show the distributions obtained from both the original 
catalogue and the revised catalogue. The data selection inthe revised 
catalogue results ina roll-over at lower magnitudes, but in any case we 
find that the SSEs approximately show a linear relationship between 
moment magnitude, M,,, and the decimal logarithm of the number of 
events with magnitude equal or larger than M, (thatis, they obey the 
Gutenberg-Richter law) with a slope (called the b value) of about 0.8. 
The abrupt drop in the frequency of events larger than M, = 6.4 sug- 
gests atruncation effect. The truncation cannot be explained by the 
transition from unbounded ruptures to bounded ruptures in width 
as this transition would occur at a much lower magnitude, M,, = 5.7, 
given the aspect ratio of the ruptures. It could be due, instead, to 
the along-strike segmentation discussed below. As there are only 
ll events with M, > 6.4, however, this observation should be consid- 
ered with caution. A previous study had also argued for SSEs obeying 
the Gutenberg-Richter law?’ but used moment inferred from dura- 
tion, assuming linear proportionality between moment and duration. 
It seems that the conclusion holds despite this probably incorrect 
scaling assumption. 

Finally, we note that the zone of SSEs can be divided into a discrete 
number of segments that slip systematically as a whole, either inde- 
pendently or jointly (Fig. 2). From the rupture patterns, cumulative slip 
distribution and number of times that a sub-fault has slipped (Extended 
Data Fig. 3), we defined 13 segments (separated by dashed red lines in 
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Fig. 2 | Spatio-temporal distribution and segmentation of SSEs. a, Timeline 
with magnitudes, labelled by event number, of all 64 SSEs of our original 
catalogue”. b, Timing and rupture extent of the SSEs. The black dots indicate 
tremors. The catalogue from Ide* is used until 2009.595, and the catalogue 


Fig. 2b). Segments 1 and 2 are extremely strongly coupled. They mostly 
rupture together except for a rupture in July 2014 (2014.612) that was 
restricted to segment 2. Segment 7 ruptured in combination with seg- 
ments 6and 8 in 2014, but never by itself. The segmentation of the Cas- 
cadia SSE zone had already been noticed’, and a similar segmentation 
is observed in Japan”. This segmentation is qualitatively similar to the 
segmentation defined by regular megathrust earthquakes*°”. 
Inconclusion, the M, « Tscaling proposed by Ide etal.° probably arises 
from the assembling of SSEs occurring under different conditions. We 
suspect that, as described here for the particular case of the SSEs in 
Cascadia, any subset of SSEs under similar conditions would yield a 
cubic scaling law as we found here. The along-strike segmentation, fre- 
quency-magnitude distribution and scaling properties of SSEs on the 
Cascadia subduction zone are thus remarkably similar to those of regu- 
lar earthquakes. The pulse-like propagation of individual events also 
looks very similar to the seismic ruptures as inferred for large SSEs in 
the context of the Mexican subduction™. We infer that the dynamics 
governing aseismic SSEs is not very different from the dynamics govern- 
ing seismic ruptures, a surprising result given that seismic ruptures are 
commonly thought to be governed by inertial effects, which are needed 
to account for the radiated seismic waves, but should not have any role 
for SSEs. Unexpectedly, our results call for re-examination of the cause 
of the My « A”? scaling, as it seems that, at least in the case of SSEs, the 
explanation based on self-similarity and the circular crack model will 
nothold. Are-examination of the effect of geometric bounding on scal- 
ing properties of regular earthquakes as well as SSEs may be needed. 
The similar scaling properties of SSEs and regular earthquakes suggest 
that SSEs might be useful in developing and testing dynamic models of 
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from PNSN (https://pnsn.org/tremor) is used thereafter. The vertical green 
dashed line marks the separation between the two catalogues. The dashed red 
lines indicate the segment boundaries defined as the rupture ends (see also 
Extended Data Fig. 3). 


earthquake sequences that are difficult to constrain from observations 
of regular earthquakes, especially given the long return period of large 
earthquakes. 
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taking into account magnitude and duration uncertainties and temporal 

filter effect (see Methods). The RMSE for c=3 (green dot) is half that of c=1 
(yellow dot), and only 10% larger than the best fitting value c= 5.09 (blue dot). 
d, Relationship between moment and rupture area of SSEs compared to the 3/2 
scaling of regular earthquakes. e, Comparison with regular earthquakes (green 
shading). Stress drop isolines (dotted grey lines) are based on the circular crack 
model’. f, Data fit assuming M,« A4, taking into account the moment 
uncertainty (see Methods). The RMSEs for d=1.25 (best value) and d=1.5 are 
indicated by the green and blue dot, respectively. 
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Fig. 4 | Frequency-magnitude distribution and aspect ratio of SSEsin 
Cascadia. a, Logarithm of the number of SSEs with moment magnitude larger 
than the value in abscissa using the original catalogue’ (blue filled circles) and 
the revised catalogue (orange filled circles). Like regular earthquakes, SSEs are 
observed to follow approximately a linear trend, that is, the Gutenberg-Richter 


relationship (see Methods). The apparent larger b value at M,, > 6.4, defined by 
only 11 events, could suggest that the distribution is truncated possibly as result 
of the along-strike segmentation. b, Aspect ratio of rupture areas. 

See Supplementary Information for details about measurements of area and 
aspect ratio. 
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Methods 


Moment-duration scaling 

We determine the best-fitting moment-duration scaling law, 
log, .(T)= (2) log.,o(Mo) +g where M, is expressed in N m and7T in 
seconds. We take into account the uncertainties in SSEs duration and 
moment (see Supplementary Information) and the effect of the tem- 
poral filter. We grid search for the best exponent, c, and intercept, g, of 
the scaling law. For each pair of exponent and intercept values, 1,500 
random catalogues of 40 SSEs are created, assuming a uniform prob- 
ability between the minimum and maximum values of moment and 
duration. We then compare these catalogues with the moments and 
durations of synthetic catalogues. The events in the synthetic catalogues 
have the same magnitudes as in the random catalogue, and thus the 
same final moment released, M,, but a duration, D,, prescribed by the 
tested scaling law. To account for the filter, we generate synthetic time 
series assuming boxcar moment rate function with amoment rate equal 
to M,/D, during the event (and ON md‘ otherwise). We apply the same 
filter as to the real data (a zero-phase digital filtering, using a 5-day win- 
dow), and estimate durations from the filtered moment rate functions 
(we take a moment rate threshold of Mo;hresh= 6-63 Nm d™, equivalent 
to the case of the smallest subfault, as defined by the SSE kinematic 
model, slipping at 40 mm yr‘ witha shear modulus = 30 GPa). Finally, 
forthe tested exponent and intercept, the RMSE is calculated between 
the durations of the 40 x 1,500 events produced and their associated 
smoothed synthetics. The range of values explored for the intercept, g, 
and exponent, c, spans from —35 to 7, and 0.5 to 9, respectively, using 
astep of 0.01 for both. The minimum RMSE for each tested exponent is 
shown in Fig. 3c. The best fit corresponds to c = 5.09 but is only about 
8% smaller than the RMSE obtained for c=3. 


Moment-area scaling 

We use a similar procedure to search for the best-fitting moment-—area 
scaling law, log,,(A) = (3) log,)(Mo) +r, taking into account the uncer- 
tainties in SSE moments. We grid search for the best exponent, d, and 
intercept, r. The area is expressed in km? and the moment in N m. For 
each pair of tested exponent and intercept, 1,500 random catalogues 
of 40 SSEs are created, assuming a uniform probability distribution 
between the estimated minimum and maximum values of moments. For 
each of these catalogues, an associated synthetic catalogue is created 
with areas prescribed to follow the tested scaling law. For each tested 
exponent and intercept, a RMSE is then calculated between the areas 
of the 40 x 1,500 events produced and their associated synthetics. The 
tested values of the intercept, r, and exponent, d, range from -15to-1.5 
and from 1to 2.5, respectively, using a step equal to 0.01 for both. The 
minimum RMSE for each exponent tested is shown in Fig. 3f and the 
best fit corresponds to an exponent equal to 1.25. 


Measurement of SSE rupture area and aspect ratio 

We define Oyoricit aS the deficit of slip rate with respect to the long-term 
slip rate. The onset of SSEs are defined by Ogericit < Vehreshe We set the slip 
deficit rate threshold, Vinresh to -40 mm yr‘. The area of aSSE is defined 
as the sum of the areas of the sub-fault with Ojofcic < Venresh (ref. 5). This 
thresholdis applied to a 30-day filtered djercie We include the neighbour- 
ing sub-faults in the calculation of the slipping area. We thus estimate 


the lengths and widths of the SSEs relative to a mean strike line that 
approximately follows the curved geometry of the megathrust and runs 
through the middle of the cumulated slip distribution of SSEs (Extended 
Data Fig. 3). For each SSE, the rupture length is defined as the distance 
between the northern and southern intersections of the rupture’s out- 
line with the mean strike line. The width is defined as twice the mean 
distance between the rupture’s outline and the meanstrike line. Because 
some SSE ruptures are not centred over it or might not even cut it, we 
shift the mean strike line along dip for each SSE, forcing it to pass through 
its slipping area where the measured length is maximum. 

Note that the SSEs spatio-temporal extension is sensitive to the inver- 
sion regularization, the temporal filter applied to Oger, and the chosen 
value for Venresh= 


Determination of magnitude-frequency distribution 

The magnitude-frequency distribution for the revised catalogue in 
Fig. 4a is calculated taking into account the uncertainties in SSE mag- 
nitudes calculated in Supplementary Information ‘Measurements of 
SSE duration and moment release’. We assume that each event has a 
uniform distribution within its moment uncertainty and sum all of those 
distributions. The resulting probability density function, Pevens, gives the 
number of events per magnitude. We then calculate for each magnitude 
tested, M,..., the number of events greater than M,,.. per year: 


N=] Pevens(My IM 


test 


The b value of the Gutenberg-Richter distribution that best fits the 
original catalogue (64 events) is estimated to be 0.78 by using 
the maximum likelihood method™. We do not estimate the b value 
for the revised catalogue owing to the rollover at lower magnitudes due 
to the data selection. 


Data availability 


The durations and moments estimated in this study are listed in Extended 
Data Table 1 and in the Source Data of Fig. 3. The slip model of Michel 
et al.’, which is used as input in this study is available at: ftp://ftp.gps. 
caltech.edu/pub/avouac/Cascadia SSE_Nature/Data_for_Nature/. 


34. Aki, K. Maximum likelihood estimate of b in the formula log N = a- bM and its confidence 
limits. Bull. Earthquake Res. Inst. 43, 237-239 (1965). 
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experienced an SSE. The black contours delimit the extent of each SSE. The 
dashed black lines ina and bcorrespond tothe selection of segments. 
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Extended Data Table 1| Manual estimation of SSE duration 


SSE # Start ; ; Start ; ; End : End ; 
(Max Duration) (Min Duration) (Min Duration) (Max Duration) 

3 2007.0267 2007.0294 2007.128 2007.1773 
4 2007.0294 2007.0733 2007.1034 2007.1472 
5&6 2007.422 2007.4264 2007.5058 2007.5579 
7 2007.4576 2007.4839 2007.5397 2007.5934 
8 2007.491 2007.5318 2007.5852 2007.6099 
9 2008.2286 2008.2642 2008.3025 2008.3464 
10 2008.316 2008.3265 2008.4477 2008.4627 
12 2008.4969 2008.5298 2008.5626 2008.6174 
13 2008.8939 2008.924 2008.9569 2008.9745 
14 2009.1266 2009.166 2009.213 2009.2197 
15 2009.1759 2009.179 2009.2135 2009.251 
16 2009.3183 2009.3238 2009.436 2009.4552 
18 2009.429 2009.485 2009.5325 2009.5839 
19 2009.5579 2009.587 2009.6989 2009.7125 
22 & 23 2010.067 2010.0921 2010.1355 2010.178 
24 2010.5859 2010.5887 2010.7064 2010.7324 
26 2010.9993 2011.0431 2011.069 2011.087 
27 2011.347 2011.372 2011.3936 2011.4867 
28 2011.3689 2011.425 2011.5031 2011.6071 
29 2011.3717 2011.4275 2011.451 2011.4747 
30 2011.5305 2011.555 2011.6865 2011.7276 
33 2011.7345 2011.796 2011.841 2011.864 
34 2012.609 2012.6684 2012.791 2012.7926 
36 2012.7242 2012.7269 2012.7844 2012.843 
37 2012.7445 2012.7998 2012.8429 2012.8638 
38 & 39 2013.1403 2013.1814 2013.305 2013.3758 
40 2013.5401 2013.562 2013.5852 2013.6934 
41 2013.6769 2013.682 2013.776 2013.781 
43 2014.0274 2014.1314 2014.2108 2014.216 
44 2014.119 2014.1218 2014.1971 2014.2357 
45 & 46 2014.333 2014.438 2014.4928 2014.5051 
47 & 50 2014.5914 2014.6051 2014.7135 2014.746 
48 2014.6215 2014.6516 2014.69 2014.7392 
54 2014.857 2014.8597 2014.955 2014.9802 
53 2015.7276 2015.7851 2015.8371 2015.8535 
54 & 55 2015.9521 2015.974 2016.168 2016.1711 
56 2015.9202 2015.9603 2015.999 2016.0178 
59 2017.039 2017.1239 2017.279 2017.2827 
62 & 63 2017.2981 2017.2991 2017.3484 2017.4086 
64 2017.5428 2017.5715 2017.603 2017.606 


The start and end time pick for the minimum duration estimation are determined by the timing of the 
first and last sub-fault with Ogericit < Vihresh- The start and end time picks for the maximum duration 
estimation are determined by the timing of the first and last sub-fault for which Ogeficit < 0. The SSE 
durations reported here are affected by the bias from the filter cut-off of about 9 days (see Supplemen- 
tary Information ‘Measurements of SSE duration and moment release’). 


Article 


Morphology of the earliest reconstructable 
tetrapod Parmastega aelidae 


https://doi.org/10.1038/s41586-019-1636-y 


Received: 8 April 2019 


Pavel A. Beznosov', Jennifer A. Clack”, Ervins LukSeviés*, Marcello Ruta’ & Per Erik Ahlberg®* 


Accepted: 10 September 2019 


Published online: 23 October 2019 


The known diversity of tetrapods of the Devonian period has increased markedly in 
recent decades, but their fossil record consists mostly of tantalizing fragments! ©. The 
framework for interpreting the morphology and palaeobiology of Devonian tetrapods 
is dominated by the near complete fossils of chthyostega and Acanthostega; the less 
complete, but partly reconstructable, Ventastega and Tulerpeton have supporting 
roles”*"*3*, All four of these genera date to the late Famennian age (about 365- 

359 million years ago)—they are 10 million years younger than the earliest known 
tetrapod fragments*””, and nearly 30 million years younger than the oldest known 
tetrapod footprints®. Here we describe Parmastega aelidae gen. et sp. nov.,atetrapod 
from Russia dated to the earliest Famennian age (about 372 million years ago), 
represented by three-dimensional material that enables the reconstruction of the skull 
and shoulder girdle. The raised orbits, lateral line canals and weakly ossified postcranial 
skeleton of P. aelidae suggest a largely aquatic, surface-cruising animal. In Bayesian and 


parsimony-based phylogenetic analyses, the majority of trees place Parmastega as a 
sister group to all other tetrapods. 


The rate of discovery of Devonian tetrapods accelerated greatly during 
the late twentieth and early twenty-first centuries. The description of 
Ichthyostega in 1932 was followed by Acanthostega in 1952, Metaxygna- 
thusin1977 and Tulerpeton in1984; all other descriptions or identifica- 
tions of genera (Hynerpeton, Ventastega, Elginerpeton, Obruchevichthys, 
Densignathus, Sinostega, Jakubsonia, Ymeria, Webererpeton, Tutusius 
and Umzantsia) as Devonian tetrapods have occurred since 1994116", 
Unnamed Devonian tetrapod material has previously been described 
from Belgium” and the USA". However, the fossils of Ichthyostega and 
Acanthostega from East Greenland*"**! remain by far the most complete 
material for Devonian tetrapods, followed by Ventastega fossils from 
Latvia**"” and Tulerpeton fossils from Russia”**™. All of these date to the 
final stage of the Devonian period (the late Famennian), by which point 
tetrapods had been in existence for about 30 million years (judging by 
evidence from trackways®”*) and had colonized both equatorial and 
polar environments”. Substantial differences between these four genera 
hint at long, divergent evolutionary histories; notably, the /chthyostega 
and Acanthostega fossils have braincases that are fundamentally dis- 
similar to each other”. 

The tetrapod material described here is securely dated to the earliest 
Famennian age, and is comparable to that of Ventastega in its degree 
of completeness. It is derived from the Sosnogorsk Formation of the 
southern part of Timan Ridge (Komi Republic, Russia)’, which straddles 
the boundary between the Frasnian age (about 382-372 million years 
ago) and the Famennian age; vertebrate remains occur only inthe part 
of the Sosnogorsk Formation that dates to the Famennian age (Extended 
Data Fig. 1). The material described here is thus only marginally younger 
than the oldest known tetrapods Elginerpeton, Obruchevichthys and 
Webererpeton, which are known only from fragmentary material>””. 


The quality of the material described here, which consists of numer- 
ous isolated bones and some articulated skull regions, is excellent. 
Multiple examples of the same bone all show the same distinctive fea- 
tures (Extended Data Fig. 2), which indicates that only a single tetrapod 
species is present (Extended Data Fig. 3). These fossils give a detailed 
picture of an animal from the earliest part of the known body-fossil 
record of the tetrapods. 


Systematic palaeontology 


Tetrapoda Jaekel, 1909 
Parmastega aelidae gen. et sp. nov. 


Remarks. The term Tetrapodais used here inits traditional, apomorphy- 
based sense of limbed vertebrates. 


Etymology. The generic name derives from parma, a word in the Komi 
language describing the landscape of hills covered by coniferous forest, 
typical for South Timan, and stégi (Greek) meaning roof, understood 
here as the skull roof; aelidae honours Aelida!. Popova (Syktyvkar State 
University) (1929-2011), who first inspired P.A.B.s interest in the natural 
sciences. 


Holotype. Institute of Geology, Komi Science Centre (IG KSC) 705/1, an 
articulated snout region (Fig. 1a—c). 


Referred material. One hundred and six individual bones or bone 
assemblies (Supplementary Table 1). 


‘Institute of Geology, Komi Science Centre, Ural Branch of the Russian Academy of Sciences, Syktyvkar, Russia. “University Museum of Zoology, University of Cambridge, Cambridge, UK. 
’Department of Geology, University of Latvia, Riga, Latvia. “Joseph Banks Laboratories, School of Life Sciences, University of Lincoln, Lincoln, UK. Department of Organismal Biology, Uppsala 
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Fig. 1| Skull roof, cheek and palate of P. aelidae. a—c, Specimen IG KSC 705/1 
(all numerical codes in the figure legends refer to specimen numbers in the IG 
KSC collections), the holotype of P. aelidae. Articulated ethmosphenoid with 
associated prefrontal in ventral (a), dorsal (b) and lateral (c) views. d, e, 705/2. 
Skull table in dorsal (d) and ventral (e) views. f, g, 705/17. Skull table and partial 
braincase in ventral view. g, False-colour image identifying the components of 
705/17. h, 705/18. Right frontal, dorsal view. i, 705/19. Left postorbital, external 
view.j, 705/20. Left jugal, external view. k, 705/25. Left lacrimal, lateral (top) and 
dorsal (bottom) views. I, 705/26. Right squamosal, external view. m, 705/5. Right 
prefrontal, external view. n, 705/4. Left postfrontal, lateral (top) and dorsal 
(bottom) views. 0, 705/28. Right maxilla in internal (top), ventral (middle) and 
external (bottom) views. p, 705/29 (left dermopalatine), 705/30 (ectopterygoid) 
and 705/31 (pterygoid) in ventral view. q, 705/32. Left dermopalatine in lateral 
(top) and ventral (bottom) views. cho, choana; fr, frontal; jo, jugal overlap; mr, 
median rostrals; na, nasal; om, orbital margin; pa, parietal; pc, preopercular 
contact; pi, pineal foramen; pmx, premaxilla; pll, postorbital lateral line; poo, 
postorbital overlap; pp, postparietal; prf, prefrontal; psp, parasphenoid; ptf, 
posttemporal fossa; qjc, quadratojugal contact; sc, semicircular canals; socc, 
supraoccipital; sn, spiracular notch; sr, spiracular recess; su, supratemporal; ta, 
tabular; te, tectal; vo, vomer. Scale bars, 10 mm (scale bar below f applies tof, g; 
the scale bar belowc applies to all other panels). 


Basal articulation 


Denticle rows 


Locality and horizon. Sosnovskiy Geological Monument, on the right 
bank of the Izhma River opposite Sosnogorsk (Komi Republic, Rus- 
sia); Sosnogorsk Formation, lowermost Famennian age (Extended Data 
Fig. 1). 


Diagnosis. A stem tetrapod diagnosed by the following unique combina- 
tion of characters: dermal ornament of preorbital region developed into 
transverse parallel ‘wave crests’ witha spacing of a few millimetres; orna- 
ment present on dorsal blade of cleithrum and on anocleithrum; orbit 
strongly raised above skull roof, framed by an anterodorsal crest anda 
vertical anterior ridge carried on prefrontal; internasal fontanelle absent; 
median rostral paired; lacrimal excluded from orbit by prefrontal- 
jugal contact; intertemporal absent; pterygoids separated in midline 
by parasphenoid; interpterygoid vacuities absent; pterygoid dentition 
restricted to two lines of denticles, running anteriorly and anterolater- 
ally from growth centre; ectopterygoid making large contribution to 
lateral wall of subtemporal fossa; middle part of otic capsule narrow, 
occupying approximately half of skull table width; posttemporal fossa 
wide, triangular; fang pair and row of marginal teeth on adsymphysial 
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Fig. 2|Lowerjaw and pectoral girdle of P. aelidae.a, 705/21. Right 
adsymphysial plate in mesial (top) and dorsal (bottom) views. b, 705/22. Right 
anterior coronoid in mesial (top) and dorsal (bottom) views. c, 705/33. Right 
middle coronoid in mesial (top) and dorsal (bottom) views. d, 705/36. Left 
posterior coronoid in mesial (top) and dorsal (bottom) views. e, 705/37. 
Articulated left splenial and adsymphysial plate in ventrolateral (top) and mesial 
(bottom) views. f, 705/34. Articulated left postsplenial, angular and surangular 
in lateral view. g, 705/76. Left prearticular in mesial view. h, 705/67. Right dentary 
in lateral (top), dorsal (middle) and mesial (bottom) views. i-k, 705/15. Left 
cleithrum and partial scapulocoracoid in mesial (i), anterior (j) and lateral (k) 
views. I, 705/95 (right cleithrum) and 705/98 (anocleithrum) in lateral view. m, 
705/98. Right anocleithrum in lateral view. n-o, 705/92 (right clavicle) and 
705/89 (interclavicle) in anterior (n) and ventral (0) views. p, 705/102. Left 
coracoid in lateral view. Scale bars, 10 mm (scale bar between a-d applies to 
these four panels; e-p are all shown at the same scale). 


plate; middle part of prearticular with large muscle scar; interclavicle 
rounded with short posterior process. 


Description 

The Parmastega material comprises the entire dermal skull (apart from 
the preopercular and the posterior part of the quadratojugal), the entire 
ethmoid and dorsal part of the otoccipital braincase, the entire lower 
jaw, the dermal pectoral girdle and the partly ossified scapulocoracoid 
(Figs. 1, 2). The material consists of a total of 106 numbered specimens 
(Supplementary Tables 1, 2), which represents a minimum of 11 indi- 
viduals; these individuals show a wide range of sizes (Extended Data 
Figs. 2,4), but were found within a small area of the site (Extended Data 
Fig. 1). Most specimens are isolated bones, but an articulated ethmoid 
(Fig. la-c) and several skull tables (Fig. 1d-g) are also present. The bones 
are three-dimensionally preserved in limestone with little or no dis- 
tortion, and have been freed from the matrix using dilute acetic acid 
(Methods). Bones from the same individual can sometimes be identified 
by matching size and sutural fit (Extended Data Fig. 3). This allows us to 
reconstruct the skull, lower jaw and pectoral girdle witha high degree of 
confidence, except for the posterior part of the suspensorium (Fig. 3). 
Assuming proportions similar to those of Acanthostega”, the maximum 
length of Parmastega was approximately 130 cm. 

The shape of the skull is broadly similar to that of Ventastega and 
Acanthostega, although the orbits of Parmastega are raised higher above 
the skull table and the snout has a distinctly concave profile (Extended 
Data Fig. 4). The strongly raised orbits and relatively narrow snout are 


Fig. 3|Reconstructions of P. aelidae.a, Skull, lower jaw and pectoral girdle of 
Parmastega in right lateral view. b, Skull in dorsal view. c, Skull and pectoral 
girdle in anterior view. d, Skull in ventral view. e, Right lower jaw ramus in mesial 
view. ac, anterior coronoid; adsym, adsymphysial plate; an, anocleithrum; ang, 
angular; art, articular; cla, clavicle; clei, cleithrum; cor, coracoid; de, dentary; 
dpal, dermopalatine; ect, ectopterygoid; eth, ethmoid; gle, glenoid;icl, 
interclavicle; ju, jugal; la, lacrimal; mc, middle coronoid; mx, maxilla; no, nostril; 
orb, orbit; ob, otoccipital braincase; po, postorbital; poc, posterior coronoid; 
pof, postfrontal; pospl, postsplenial; preart, prearticular; pter, pterygoid; qj, 
quadratojugal; scap, scapula; spl, splenial; sq, squamosal; suf, subtemporal 
fossa; sur, surangular. Vertical hatching indicates a missing element with 
unknown outline; horizontal hatching indicates a damaged object with known 
outline. Scale of reconstruction determined by largest individual. Scale bars, 
10 mm (a-dareall shown tothe same scale, whichis givenin a). 


reminiscent of the elpistostegids Flpistostege and Tiktaalik®”. However, 
the orbits of Parmastega are proportionately larger than those in the 
elpistostegids (Extended Data Fig. 5). 

The dermal bone pattern of the skull roof and cheeks is, witha single 
exception, characteristic of Devonian tetrapods. There is no postros- 
tral mosaic or internasal fontanelle. The median rostral is paired as in 
Acanthostega, Ventastega and Elpistostege, but unlike in /chthyostega 
and Elginerpeton in which it is single*"*?°28, A tectal bone forms the 
dorsal margin of the naris, which lies very close to the jaw margin and 
faces ventrally; the ventral margin of the naris is formed by the maxilla, 
as there is no lateral rostral. The lacrimal is excluded from the orbit by 
a long suture between the jugal and prefrontal. The latter is elongate 
and carries two bony crests, one forming the anterior part of the ‘eye- 
brow andthe other an oblique ridge in front of the orbit; both are more 
strongly developed in large specimens (Figs. 1m, 3a—-c). The frontals 
are elongate with a distinct transverse ‘step’ on the posterior part of 
the dorsal surface, marking the transition from snout to skull table. 
Intertemporals are absent. The lateral margins of the supratemporal and 
tabular forma raised spiracular margin; the tabular horn has distinct 
dorsal and ventral components. A small part of the dorsal surface of the 
braincase is exposed posterior to the tabulars. The dermal ornament 


of the preorbital region includes areas of irregular transverse ripples 
(Fig. 1h, m, Extended Data Fig. 2), somewhat similar to the ornament of 
Umzantsia" but much coarser; elsewhere, the ornament grades into the 
conventional tetrapod ‘starburst’ ornament. Partly enclosed sensory- 
line canals are well-developed on the premaxilla, cheek bones and the 
anterior part of the nasals, but are absent from the skull table (Fig. 1d). 

Between the anterior suture for the jugal and the posterior suture 
for the preopercular, the ventral margin of the squamosal presents two 
distinct sutural margins that appear to be contacts for two bones (Fig. Il). 
The posterior of these margins must be for the quadratojugal; as the jugal 
lacks a posterior process, we tentatively infer that the anterior segment 
of the ventral margin of the squamosal contacts the maxilla (Fig. 3a). 
A squamosal-maxillary contact is characteristic for ‘fish’ members of 
the tetrapod stem-group (such as Eusthenopteron*’) and its presence 
in Parmastega is unique among tetrapods. 

The palatal morphology of Parmastega is intermediate between that of 
the elpistostegids and that of Devonian tetrapods. In the elpistostegids 
Panderichthys and Tiktaalik, the pterygoids are separated in the midline 
by along denticulated parasphenoid*”. The vomer has a transverse 
posterior margin; in Panderichthys, this margin ends mesially ina short 
posterior process that extends along the lateral margin of the parasphe- 
noid*". This condition is broadly similar to that observed in Fusthenop- 
teron*°. By contrast, in Ichthyostega, Acanthostega and Ventastega, the 
pterygoids meet inthe midline (separating the parasphenoid fromthe 
vomers) and the most-posterior point of the vomer is its posterolateral 
corner*’®”?, In Parmastega, the parasphenoid separates the pterygoids 
butis not denticulated anteriorly, and the vomeral morphology is inter- 
mediate between that of Panderichthys and Devonian tetrapods (Figs. 1a, 
3d). The pterygoid carries alongitudinal row or narrow band of denticles, 
and ashorter oblique band that extends anterolaterally. Uniquely, the 
ectopterygoid extends posteriorly past its contact with the pterygoid 
to contribute to the lateral margin of the subtemporal fossa (Fig. 3d). 
This relationship is demonstrated by a sutural fit of three bones from 
one individual (Fig. 1p). 

Two parts of the braincase are preserved: the ethmoid and part of 
the sphenoid in IG KSC 705/1, and the dorsal part of the otoccipital in 
IGKSC 705/17 (Fig. 1a, f, g). An ossified ethmoid is shared only with /chthy- 
ostega among known Devonian tetrapods®. The otoccipital has a strongly 
developed pro-otic buttress, a narrow cranial cavity with small inner ears 
and a posttemporal fossa that is bounded laterally by a crista parotica 
that extends onto the tabular horn. In ventral view, the outline of the 
occipital resembles that of Tiktaalik”, but is proportionately broader in 
Parmastega. Otoccipitals that are previously known from Devonian tetra- 
pods show one of two radically different morphologies. In Acanthostega 
and Ventastega, the narrow posttemporal fossa is open laterally and the 
braincase occupies almost the whole ventral surface of the skull table; by 
contrast, the narrow braincase in /chthyostega is flanked by large cavities 
under the skull table that probably housed spiracular diverticula°”*>”. 
The otoccipital of Parmastega provides a plausible ancestral groundplan 
for both of these morphologies (Extended Data Fig. 6). 

The construction of the lower jawis typical for tetrapods”, although 
it is unusually slender and delicate (Figs. 2a—-h, 3e). The only ossified 
parts of the Meckelian element are the articular and the symphysis. 
The prearticular carries very few denticles but bears a large ventral 
muscle scar on its middle part. The contact between the prearticular 
and the mesial lamina of the splenial is not a tight suture, as in other 
known Devonian tetrapods’, but is instead a loose overlap that must 
have contained a ligamentous component and allowed a degree of flex- 
ibility. Fang pairs—positioned mesial to the tooth row—are present on 
the adsymphysial plate, dentary, and anterior and middle coronoids. 
Postsplenial and surangular pit lines are present. The dentary is splint- 
like and loosely attached. 

The pectoral girdle is U-shaped in anterior view, and the dorsal blades 
of the cleithra are approximately parallel (Figs. 2i-o, 3a, c). The dorsal ori- 
entation of the anocleithrum, which we determined from well-preserved 
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contact surfaces, makes the girdle notably tall. The cleithrum and ano- 
cleithrum both carry a dermal ornament, a characteristic that is other- 
wise absent in known tetrapods (except for Umzantsia"). The clavicle 
is narrow, and the interclavicle has a rounded corpus with a short pos- 
terior process (Fig. 2n, 0); both of these bones somewhat resemble the 
corresponding elements in /chthyostega'®, whereas Acanthostega and 
Ventastega have broader clavicles and lozenge-shaped interclavicles”””. 
The scapulocoracoid is ossified in two parts: a dorsal scapular part that 
is co-ossified with the cleithrum (Fig. 2i) and a posterior coracoid ossifi- 
cation that carries the glenoid (Fig. 2p). As in Ichthyostega, Flginerpeton 
and Hynerpeton, the subscapular fossa is deep and has a narrow apex; 
by contrast, in Acanthostega and Ventastega the fossa is shallow and 
broad?"8?243, The limbs, pelvis, vertebrae and ribs are not preserved 
in the material from Sosnogorsk. 


Phylogenetic analysis 


We evaluated the phylogenetic position of Parmastega with maximum 
parsimony and Bayesian inference analyses, applied to a data matrix of 
26 taxaand 113 characters (Methods). The character list and data matrix 
are provided in the Supplementary Information. 

The resolution of the strict-consensus, unweighted parsimony analy- 
sis was poor: all of the Devonian tetrapods (including Parmastega) 
formed a polytomy together with ‘whatcheeriid-grade’ Carboniferous 
taxa (Extended Data Fig. 7a). However, in 70% of the trees, Parmas- 
tega was the sister group to all other tetrapods. We used a range of 
approaches (character reweighting by rescaled consistency index and 
K values, and the calculation of agreement subtrees from consensus 
trees) to investigate the phylogenetic signal in the dataset (Extended 
Data Fig. 7b, c, e-h), which revealed consistent patterns. Ifthe position 
of Parmastega was resolved, it was always placed as the sister group to 
all other tetrapods; if Ventastega was resolved, it was placed immedi- 
ately crownward of Parmastega. Ichthyostega was resolved crownward 
of Acanthostega in the Adams consensus of unweighted trees, but in 
the reweighted analyses Acanthostega was crownward of Ichthyostega. 
The Bayesian tree (Extended Data Fig. 7d) also recovered these posi- 
tions for Parmastega and Ventastega, but did not resolve Ichthyostega 
and Acanthostega. 


Discussion 


Parmastega is morphologically intermediate between the elpistostegids 
Tiktaalik, Elpistostege and Panderichthys on the one hand, and previously 
known Devonian tetrapods on the other—but primitive and derived 
characters are not evenly distributed across its anatomy. The lower jaw, 
pectoral girdle, external dermal bone pattern of the snout region, the 
absence of gular plates and the relative size of the orbits are all tetrapod- 
like, whereas elpistostegid-like characteristics persist in the palate and 
the dermal ornamentation of the cleithrum and anocleithrum. Although 
no appendage bones are known, the morphology of the pectoral girdle 
strongly suggests that Parmastega possessed limbs rather than paired 
fins. The scapulocoracoid, which forms the proximal attachment for 
many forelimb muscles and undergoes substantial changes in shape 
from elpistostegids*** to tetrapods?"*”?”*, is particularly informa- 
tive in this regard: Parmastega conforms to the tetrapod pattern. The 
shape and construction of the lower jaw, and the absence of gular plates, 
suggest that gill ventilation and prey capture worked in the same way 
as in more-crownward Devonian tetrapods. The reconfiguration of the 
palate and the loss of dermal ornament on the shoulder girdle evidently 
lagged behind these transformations. 

Until now, one of the most puzzling aspects of the anatomy of Devo- 
nian tetrapods has been the specialized ear region of Ichthyostega, which 
differs markedly from the ear regions in other early tetrapods®”°. The 
braincase of Parmastega is morphologically intermediate between that 
of Ichthyostega and those of Acanthostega and Ventastega, providing 
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a plausible hypothetical ancestor for both patterns (Extended Data 
Fig. 6a). However, these transformations cannot be mapped parsimoni- 
ously onto the phylogeny, indicating the presence of non-trivial homo- 
plasy either in the braincases or in other parts of the skeleton (Extended 
Data Fig. 6b). 

The three-dimensional preservation and apparent absence of post- 
mortem transport makes the Parmastega fossils palaeobiologically 
informative. The environment of preservation, which was probably also 
the living environment of Parmastega, was a coastal lagoon with brack- 
ish water and arich fish fauna including the placoderm Bothriolepis and 
various sarcopterygians**. The concentration of the tetrapod remains in 
asmall area of the site (Extended Data Fig. 1) suggests that Parmastega 
may have beena schooling animal. The vertebrate-bearing bed (bed 40, 
the ‘fish dolomite’) is composed of two consecutive tempestites; pos- 
sibly aschool of Parmastega was killed by the first storm event and their 
skeletons partly disarticulated by the second. Schooling behaviour is 
also implied by the mass occurrence of Acanthostega on Stensi6 Bjerg 
(East Greenland)*”. 

Raised orbits and a lack of lateral line canals on the skull table in Par- 
mastega (Fig. 3a, b) suggests it adopted a surface-skimming position 
in the water, with emergent eyes, similar to that of extant crocodilians 
(Extended Data Fig. 8). The increase in orbit size across the transition 
between fish and tetrapods has previously been linked to a shift from 
aquatic to aerial vision*®; the relative orbit size of Parmastega falls well 
within the tetrapod range (Extended Data Fig. 5) and its eyes were thus 
probably adapted for use in air. Although all known Devonian tetrapods 
have dorsally positioned eyes, Parmastega shows the most extreme 
condition (Extended Data Fig. 4). The nostrils of Parmastega face 
ventrally, which suggests that the nose was not used for breathing air 
(Extended Data Fig. 8). The dorsally placed spiracles may have taken 
onthis function, as has previously been argued for Panderichthys” and 
more-crownward Devonian tetrapods’. Similar to the condition in 
Ventastega, Acanthostega® and Ichthyostega’’, the lower jaw does not 
match the upperjawin curvature in lateral or in ventral view (Extended 
Data Fig. 9). 

The Parmastega material contains no vertebrae, ribs, pelvic girdles or 
limb bones. The lack of evidence for post-mortem transport, the partially 
ossified nature of the scapulocoracoid even in the largest individuals 
and the preservation of the delicate isolated coracoid ossifications 
(Fig. 2i-I, p) suggest that this absence is not a taphonomic artefact but 
that itinstead reflects a very lightly ossified, or even cartilaginous, axial 
and appendicular skeleton. Ventastega may also have had a lightly ossi- 
fied postcranial skeleton®. Acanthostega and Ichthyostega became fully 
ossified as adults” ?”"””, but Acanthostega appears to have had along 
juvenile stage with non-ossified endoskeleton”. Functionally, the poor 
ossification of Parmastega suggests little or no capacity for terrestrial 
locomotion. This contrasts strangely with the cranial morphology, which 
suggests that the eyes were habitually held above the surface of the 
water—and thus implies some kind of engagement with the terrestrial 
environment. Even more puzzling is the fact that this poorly ossified 
postcranial skeleton is apbomorphic: elpistostegids are well-ossified, 
as are the majority of tetrapodomorph fishes” 0, 

Parmastega gives us the earliest detailed glimpse of a tetrapod: an 
aquatic, surface-skimming predator, just over a metre in length, living 
ina lagoon ona tropical coastal plain. Parmastega is phylogenetically 
the least-crownward of all of the non-fragmentary tetrapods, butit is not 
necessarily representative of the primitive conditions for the group. The 
slightly earlier Elginerpeton—which was also probably aquatic and was 
even larger than Parmastega (Extended Data Fig. 4)—had well-ossified 
girdles and limb bones, as well as a distinctive head shape with a nar- 
row snout>*°“3, Moreover, the trackway record shows that tetrapods 
originated at least 20 million years before Parmastega®”®, and the very 
existence of the trackways—which implies weight-bearing limbs, even if 
the prints were made in water—points to these forms having well-ossified 
postcranial skeletons. Together with the evidence for considerable 


morphological homoplasy among Devonian tetrapods, this hints at a 
tangled and still-unknown early history for limbed vertebrates. 
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Methods 


Preparation and illustration of specimens 

The specimens were collected from the Sosnovskiy Geological Monu- 
ment, onthe right bank of the river Izhma opposite Sosnogorsk Town 
(Komi Republic, Russia), during a series of field seasons from 2002 to 
2012. The bulk of the material was collected during a large-scale exca- 
vation in 2009-2012, during which approximately 50 m? of the bone- 
bearing ‘fish dolomite’ bed was dug out and then broken into small blocks 
using hammers, chisels, an angle grinder, drill and portable jackhammer. 
Blocks that contained parts of the same bone were glued together. The 
bones were freed from the limestone matrix using dilute (7-10%) acetic 
acid, alternating with drying and covering with the consolidants PVB 
(before 2010) and paraloid B-72 (after 2010). The reconstructions of 
the skull and lower jaw were assembled by hand on the basis of photo- 
graphs of individual bones, taken at appropriate angles. The reconstruc- 
tion of the pectoral girdle was produced by sticking together the right 
anocleithrum, cleithrum, clavicle and interclavicle of one individual, 
making a three-dimensional virtual model of the assembly using photo- 
grammetry (Agisoft PhotoScan), and importing this model into 3-matic 
(Materialise), in which it was duplicated, mirrored and assembled intoa 
complete girdle. The drawings of the girdle in Fig. 3 were traced directly 
from lateral and anterior projections of the model. 


Phylogenetic analysis 
The phylogenetic position of Parmastega was evaluated with maximum 
parsimony and Bayesian-inference analyses applied to a data matrix of 
26 taxa and 113 characters (Supplementary Information), on the basis 
of arecently published matrix” with the addition of four characters 
(character numbers 7, 27, 28 and 29). Before all analyses, we explored 
the occurrence of possible ‘taxonomic equivalents™ by subjecting the 
matrix to safe taxonomic reduction using the Claddis package” in the 
Renvironment for statistical computing and graphics (https://cran.r- 
project.org). No taxon was identified as being suitable for safe deletion. 
For all parsimony analyses, we used PAUP* version 4.0a (build 164) 
with the following search settings. The ‘collapse branch’ option was 
enforced for branches that could possibly attain a minimum length of 
zero. Tree searches used a heuristic option with a tree bisection-recon- 
nection branch-swapping algorithm, and saving no more than a single 
tree of length greater than or equal to1step in each replicate, and using 
amaximum of 5,000 random step-wise taxon addition replicates while 
holding a single tree in memory at each step. Following this initial round 
of tree searches, an additional branch-swapping round was conducted 
onall trees saved inthe memory-this time with the option of saving mul- 
tiple trees in effect. This second round of tree searches was repeated ten 
times. No shorter or additional trees were found at the end of this second 
round inany of the parsimony analyses. Three analyses were carried out 
under maximum parsimony, each with the settings specified above. 
In the first analysis, all characters were treated as unordered and of 
equal unit weight. We obtained 23 shortest trees at 278 steps, with an 
ensemble consistency index of 0.5 (0.4908 excluding 5 parsimony- 
uninformative characters), an ensemble retention index of 0.6911 and 
an ensemble rescaled consistency index of 0.3456. A permutation-tail 
probability test using 1,000 replicates showed that the length of 
the shortest trees differed significantly from random (P ~ 0.001). The 
strict consensus tree (Extended Data Fig. 7a) was poorly resolved. The 
Adams consensus tree (Extended Data Fig. 7b) had greater resolution, 
and placed Parmastega and Flginerpeton as the joint (unresolved) sister 
groups toall other tetrapods. The agreement subtree (a pruned topol- 
ogy that included only those taxa for which all most-parsimonious trees 
agreed upon mutual relationships) included 19 out of the 26 original taxa 
(Extended Data Fig. 7e): Acanthostega, Dendrerpeton, Densignathus, Flg- 
inerpeton, Greererpeton, Ossinodus and Tantallognathus were deleted. 
The node support value was evaluated via bootstrapping and jackknif- 
ing® in PAUP*, in each case using 50% character resampling and 50,000 


random resampling replicates with the fast step-wise addition. In both 
cases, very few nodes received support—namely post-Panderichthys 
taxa, post-elpistostegalian taxa, baphetids and a clade of Foherpeton 
plus Proterogyrinus. 

In the second analysis, characters were re-weighted by the largest 
values of their rescaled consistency indexes from the initial analysis. 
PAUP* yielded a single tree (Extended Data Fig. 7c) that was 112.3561 steps 
long, with an ensemble consistency index of 0.6804 (0.6655 excluding 
uninformative characters), an ensemble retention index of 0.8297 and 
an ensemble rescaled consistency index of 0.5645. This tree was three 
steps longer than the trees from the unweighted analysis and did not 
represent a significantly better fit for the data, in terms of tree length, 
than the unweighted trees (based upon Templeton, Kishino-Hasegawa, 
and winning-sites tests in PAUP*). The weighted analysis confirmed the 
status of Parmastega as the most-basal tetrapod. 

Inthe third analysis, we used implied weighting”, experimenting with 
different integer values for Goloboff’s constant of concavity K. We ran 
analyses with 1< K<10 (for example, ref. °’). For each K value, we saved 
all trees that were generated at the end of the analysis. The separate 
tree files obtained from all K-weighted analyses were stored in PAUP* 
after filtering out duplicated tree topologies. This process resulted in 
five K-weighted trees, which were summarized with a strict consensus 
(Extended Data Fig. 7f), an agreement subtree (Extended Data Fig. 7g) 
and an Adams consensus (Extended Data Fig. 7h). The agreement sub- 
tree included 22 taxa: Densignathus, Flginerpeton, Metaxygnathus and 
Ossinodus were deleted. 

For the Bayesian inference analysis, we used MrBayes v. 3.2.6 (ref.°”), 
with the following settings: variable coding; gamma-distributed rate 
model; 10’ generations and four chains; and discarding the first 25% 
of sampled trees. The convergence diagnostic was evaluated through 
inspection of the potential scale reduction factor values® output by 
MrBayes. These values approached or were identical to 1, indicating 
successfully convergent runs (Supplementary Information). Credibility 
values for nodes in the Bayesian results (Extended Data Fig. 7d) were 
moderate-to-strong for most nodes. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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In total, 132 specimens comprising 183 skeletal elements were col- 
lected during the entire period of excavations (2002-2012). One hun- 
dred and six specimens (all of them figured in Supplementary Table 1) 
have been deposited in the collection of the Institute of Geology, 
Komi Science Centre (Ural Branch of the Russian Academy of 
Sciences, Syktyvkar, Russia) under collection number IG KSC 705/. 
One specimen has been deposited in the Ukhta Local Museum under 
collection number ULM 2599. The IG KSC and ULM specimens are 
available for examination. Other specimens have been reserved for 
sharing with other museums. The Life Science Identifier for Par- 
mastega is urn:|sid:zoobank.org:act:76B5BB03-42FE-4F46-A284- 
F95E973CEE96. 


50. Chen, D. et al. A partial lower jaw of a tetrapod from “Romer’s Gap”. Earth Env. Sci. Trans. 
R. Soc. Edinb. 108, 55-65 (2018). 

51. Wilkinson, M. Majority-rule reduced consensus trees and their use in bootstrapping. Mol. 
Biol. Evol. 13, 437-444 (1996). 

52. Lloyd, G. T. Estimating morphological diversity and tempo with discrete character-taxon 
matrices: implementation, challenges, progress, and future directions. Biol. J. Linn. Soc. 
118, 131-151 (2016). 

53. Swofford, D. L. PAUP* Phylogenetic Analysis Using Parsimony (*and Other 
Methods) Version 4 (Sinauer, 2003). 

54. Wilkinson, M., Peres-Neto, P. R., Foster, P. G. & Moncrieff, C. B. Type 1 error rates 
of the parsimony permutation tail probability test. Syst. Biol. 51, 524-527 
(2002). 

55. Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. 
Evolution 39, 783-791 (1985). 


56. Farris, J. S., Albert, V. A., Kallersj6, M., Lipscomb, D. & Kluge, A. G. Parsimony jackknifing 
outperforms neighbor-joining. Cladistics 12, 99-124 (1996). 

57. Goloboff, P. A. Estimating character weights during tree search. Cladistics 9, 83-91 (1993). 

58. Congreve, C. R. & Lamsdell, J. C. Implied weighting and its utility in palaeontological 
datasets: a study using modelled phylogenetic matrices. Palaeontology 59, 447-462 
(2016). 

59. Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under 
mixed models. Bioinformatics 19, 1572-1574 (2003). 

60. Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. 
Stat. Sci. 7, 457-472 (1992). 

61. Blom, H. Taxonomic revision of the Late Devonian tetrapod Ichthyostega from East 
Greenland. Palaeontology 48, 111-134 (2005). 

62. Ahlberg, P. E., Friedman, M. & Blom, H. New light on the earliest known tetrapod jaw. J. 
Vertebr. Paleontol. 25, 720-724 (2005). 

63. Robinson, J. The Evolution of the Early Tetrapod Middle Ear and Associated Structures. 
PhD thesis, Univ. College London (2006). 


Acknowledgements We thank Y. Gatovsky, A. Zhuravlev and D. Ponomarev for their support of 
the project, and the dig crews of the 2009-2012 excavations for all their hard work. A. lvanov 
identified the first tetrapod mandible from Sosnogorsk, in the Chernyshov Collection. P.A.B. 
acknowledges the support of National Geographic Society grant 9099-12 and UNDP/GEF 


project no. 00059042. E.L. acknowledges the support of Latvian Council of Science grant 
Z-6153-110. P.E.A. acknowledges the support of a Wallenberg Scholarship from the Knut and 
Alice Wallenberg Foundation. 


Author contributions P.A.B. initiated and directed the excavation programme at Sosnogorsk, 
which produced the material for the study. E.L. and P.E.A. participated in excavations. P.A.B. 
carried out all preparation, consolidation and photography of specimens. P.E.A. made the 
reconstructions of the skull, lower jaw and shoulder girdle. M.R. performed the phylogenetic 
analyses. P.E.A. made Figs. 1-3 and Extended Data Figs. 2, 4-9. P.A.B. made Extended Data 
Figs. 1, 3, as well as Supplementary Tables 1, 2. P.A.B, J.A.C., E.L., M.R. and P.E.A. participated in 
the interpretation of the material and the writing of the paper. 


Competing interests The authors declare no competing interests. 


Additional information 

Supplementary information is available for this paper at https://doi.org/10.1038/s41586-019- 
1636-y. 

Correspondence and requests for materials should be addressed to P.E.A. 

Peer review information Nature thanks Nadia Frobisch and the other, anonymous, reviewer(s) 
for their contribution to the peer review of this work. 

Reprints and permissions information is available at http://www.nature.com/reprints. 


Article 


= 
a c Te 
< 
= 
<= 
N 
Arkhangelsk 
Z_Sosnogorsk 
Ukhta/oo} 
F j J 
ie 4 Syktyvkar 
7 Saint Petersburg 
» 
Moscow peu Cte 
2° 
b 


me 
4 
veer 


2012-08 
\e 


Ss O S N O GO RS K 


Ties een Ce 


ee 


Extended Data Fig. 1| See next page for caption. 


Extended Data Fig. 1| The distribution of Parmastega at the Sosnogorsk 
fossil site. a, b, Maps of increasing resolution, showing the location of 
Sosnogorsk within northwest Russia. The box around Ukhta and Sosnogorsk ina 
indicates the region shown in b. Inb, the brown belt that extends from north to 
south indicates the outcrop of Famennian (D,fm) deposits in the region, and the 
yellow arrow points to the Sosnogorsk fossil site (Sosnovskiy Geological 
Monument). c, Stratigraphic column through the Sosnogorsk Formation, and 
part of the overlying marine Izhma Formation. Note the possible position of the 
Frasnian-Famennian boundary (D,f-D;fm) in the lower part of the Sosnogorsk 
Formation. The vertebrate-bearing part of the formation is shown in detail on 
the right; the tetrapod-bearing level is indicated witha red vertical bar. 


d, General view of outcrop no. 20 (Sosnovskiy Geological Monument) fromthe 
opposite bank of the Izhma River. 1, limestone; 2, dolomite; 3, clay; 4, nodular 
limestone; 5, scree; and 6, landslide. D;sn, Sosnogorsk Formation, Diz, Izhma 
Formation. The distance A’-B’ indicates the area of the main excavation that 
took place in 2010-2012. e, Main excavation. The distance A-B indicates the area 
in which all of the tetrapod bones were found, during the excavation in 2012. The 
photograph was taken on 2 August 2012. f, Sketch map of the main excavation 
(2012), showing the distribution of tetrapod bones within the bed. The cluster 
numbers are indicated in orange. The background mapsinaand bwere taken 
from https://yandex.ru/maps; the geological features of b were taken fromthe 
open-access State Geological Map at https://vsegei.ru/. 
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Extended Data Fig. 2| Frontal bones of Parmastega. This figure shows all of 
the complete and near-complete frontals of Parmastega (eight out of nine specimens are IG KSC 705/3 (reversed), IG KSC 705/40, IG KSC 705/44 (reversed), 


known frontals) to scale, oriented with anterior at the top and aligned on the IG KSC 705/43, IG KSC 705/45, IG KSC 705/18 (reversed), IG KSC 705/42 and 
centre of radiation (horizontal line). The right frontals have been reversed so IG KSC 705/41. Scale bar, 10 mm. 


that all bones have the appearance of left frontals. From left to right, the 
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Extended Data Fig. 3| Bone associations. a, b, Diagrammatic images showing Inthe lateral view of b, the preserved frontal and nasal are shown (even though 
the associated bones (in orange) of two individual skulls. a, The holotype they are in fact onthe other side of the skull). c, Diagrammatic representation of 
IG KSC 705/1. b, The largest individual, IG KSC 705/2-705/14 and IG KSC 705/99. the number of specimens of different bones in the sample. 
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Extended Data Fig. 4 | Size and shape of Devonian tetrapods. Silhouette known individuals. The left-hand column shows the smallest individuals of 
reconstructions of the heads of known, reconstructable Devonian tetrapods. Parmastega (all from Sosnogorsk) and /chthyostega (based onthe entire East 
Reconstructions are drawn to the same scale. The lower jaw of Elginerpeton—the Greenland collection, reviewed in ref. “). Note the similarity in size range despite 
largest known Devonian tetrapod, and for which the skull cannot be the very different nature of the samples. Ventastega and Acanthostega show 
reconstructed—is also included. All reconstructions except for Acanthostegaare —_ narrowsize ranges, whichare not illustrated. Reconstructions modified from 
assembled from more than one specimen; the specimen numbers indicate the the following sources: /chthyostega, ref.°; Acanthostega, ref.*!; Ventastega, 


specimen used to determine the scale. The right-hand column shows the largest ref. *; Elginerpeton, ref. . 
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Extended Data Fig. 5 | Relative orbit size. Plot of orbit length versus skull 
length for a range of tetrapodomorph fishes, elpistostegids, Devonian 
tetrapods and post-Devonian tetrapods. Data are taken from ref. ””, except 
Parmastega, whichis based on the largest known individual (Extended Data 

Fig. 3). Post-Devonian tetrapods from ref.” not included in our phylogenetic 
analysis are not shown. Ac, Acanthostega; Ba b, Baphetes bohemicus; Bak, 
Baphetes kirkbyi; Bal, Baphetes lintonensis; Bal, Balanerpeton; Be, Beelarongia; 
Br, Bruehnopteron; Cab, Cabonnichthys; Can, Canowindra; Cl, Cladarosymblema; 


Cra, Crassigyrinus; Den, Dendrerpeton; Ed, Edenopteron; Elp, Elpistostege; Eoh, 
Foherpeton; Eu, Eusthenopteron; Gog, Gogonasus; Goo, Gooloogongia; Gre, 
Greererpeton; Gy, Gyroptychius; He, Heddleichthys; Ich, Ichthyostega; Ko, 
Koharalepis; Man, Mandageria; Mar, Marsdenichthys; Meg, Megalocephalus; Oss, 
Ossinodus; Ost, Osteolepis; Pal, Palatinichthys; Pan, Panderichthys; Par, 
Parmastega; Ped, Pederpes; Pro, Proterogyrinus; Scr, Screbinodus; Sil, 
Silvanerpeton; Tik, Tiktaalik; Tin, Tinirau; Ven, Ventastega; Wha, Whatcheeria. 
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Ichthyostega: 
large spiracular cavities, 
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‘posttemporal fossae’ 
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Extended Data Fig. 6 | Otoccipital morphologies of Devonian tetrapods. a, 
Comparative diagram of the otoccipial regions of Parmastega, Ichthyostega 
(new reconstruction, based on data from refs. 6°), Ventastega (modified from 
ref. *) and Acanthostega (modified from ref. °, semicircular canals modified 
from ref.) in ventral view. The basioccipital-exoccipital complex is preserved 
only in /chthyostega and Acanthostega; in these genera the inner ear is shown 
only onone side. Drawings are scaled to the same length from pineal region to 


Parmastega: 
moderate-sized 
spiracular cavities 
and inner ears, 
posttemporal fossae 
enclosed laterally 


(under skull roof) 


Ventastega (top) and Acanthostega (bottom): 


small spiracular cavities, 
large inner ears, 
posttemporal fossae open laterally 


Ichthyostega 


Acanthostega 


posttemporal fossa 
(under braincase) 


posttemporal fossa 
(under skull roof) 


ae inner ear 


posterior margin of otic capsule. The inner ear is represented by the grooves for 
the anterior and posterior oblique semicircular canals, except in /chthyostega in 
which it is represented by the sacculus (modified from ref. 7°). The braincases are 
arranged by morphological similarity, so that aminimum number of 
transformations are required along each branch. b, Consensus phylogeny from 
the analyses presented in this paper. The phylogenetic topology does not match 
the similarity dendrogram. 
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Extended Data Fig. 7 | Phylogenetic analysis. a, Unweighted strict-consensus 
tree. b, Unweighted Adams consensus tree. c, Single tree resulting from 
reweighting characters by the rescaled consistency index. d, Bayesian tree, with 
credibility values at nodes. e, Maximum-agreement subtree of unweighted 
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Extended Data Fig. 8 | Parmastega and caiman. Comparisoninleftlateralview — thesurface. Note the difference inthe positions of the nostrils. The caiman 
of spectacled caiman (Caiman crocodilus) on the left and Parmastega onthe image is based onacomputed tomography scan of a skull inthe Digimorph 
right, drawn to the same size, showing the inferred similar cruising posture at Archive (http://www.digimorph.org/specimens/Caiman_crocodilus/). 
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Extended Data Fig. 9 | Fit of dentary against upper jaw. a, Dentary of closed mouth, showing mismatch in curvature between upper and lower jaws. c, 
Parmastega (IG KSC 705-67) fitted against palatal reconstruction to show the Composite reconstruction of Ventastega, superimposing lower jaw rami (from 
difference in curvature between the spade-shaped snout and the relatively ref.*°) onskull reconstruction (from ref. **), showing shape relationship similar 


straight dentary. b, Lateral view of skull reconstruction of Parmastega with toa. Nottoscale. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Materialise 3-matic Research 12.0 (software for manipulating three-dimensional virtual objects in space; used for constructing model of 
pectoral girdle of Parmastega). 
Data analysis PAUP* version 4.0a and MrBayes version 3.2.6 (for phylogenetic analysis) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


All specimens figured and described in the paper are accessioned to the Institute of Geology, Komi Science Centre, Ural Branch of the Russian Academy of Sciences, 
Syktyvkar, Russia, and are deposited there. The accession code is |G KSC 705/. All the specimens are available for examination. 
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Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description 
Research sample 


Sampling strategy 


Data collection 


Timing and spatial scale 
Data exclusions 
Reproducibility 


Randomization 


Description of fossil material of the Devonian tetrapod Parmastega aelidae. 
All known specimens of this taxon. 


We excavated the fossil locality (Sosnovskiy Geological Monument, at Sosnogorsk on the bank of the Izhma River) and collected all 
the fossils we could find. The fossils were freed from the rock with dilute acetic acid by Pavel Beznosov. 


The primary interpretation of the fossils and the assembly of the reconstruction were undertaken by Pavel Beznosov and Per 
Ahlberg, working with the specimens in Syktyvkar during a series of visits by Per Ahlberg. 


Excavations were carried out during 2002-2012. 
No data were excluded. 
Not applicable. 


Not applicable. 


Blinding Not applicable. 


Did the study involve field work? Yes No 


Field work, collection and transport 


Field conditions Typical summer weather in northern Russia. The weather conditions had no impact on data gathering. 


Location Sosnovskiy Geological Monument, Sosnogorsk, right bank of Izhma River, Komi Republic, Russia. 


The fieldwork was carried out by the Geological Institute of the Komi Science Center, Uralian Branch of the Russian Academy of 
Sciences, in accordance with local and national regulations. The material was not exported from Russia. 


Access and import/export 


Disturbance Blocks of limestone were removed from the riverbank. The annual ice-melt and spring flood of the Izhma River vigorously scours 


the banks and soon removes any trace of human disturbance. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 
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Palaeontology 


Specimen provenance The specimens come from the Sosnovskiy Geological Monument, Sosnogorsk, right bank of Izhma River, Komi Republic, Russia, 
and were collected by the Geological Institute of the Komi Science Center, Uralian Branch of the Russian Academy of Sciences in 
accordance with local and national regulations. 


Specimen deposition Geological Institute of the Komi Science Center, Uralian Branch of the Russian Academy of Sciences 


Dating methods Not applicable 


Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information. 
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The colorectal adenoma-carcinoma sequence has provided a paradigmatic framework 
for understanding the successive somatic genetic changes and consequent clonal 
expansions that lead to cancer’. However, our understanding of the earliest phases of 
colorectal neoplastic changes—which may occur in morphologically normal tissue—is 


comparatively limited, as for most cancer types. Here we use whole-genome 
sequencing to analyse hundreds of normal crypts from 42 individuals. Signatures of 
multiple mutational processes were revealed; some of these were ubiquitous and 
continuous, whereas others were only found in some individuals, in some crypts or 
during certain periods of life. Probable driver mutations were present in around 1% of 
normal colorectal crypts in middle-aged individuals, indicating that adenomas and 
carcinomas are rare outcomes of a pervasive process of neoplastic change across 
morphologically normal colorectal epithelium. Colorectal cancers exhibit 
substantially increased mutational burdens relative to normal cells. Sequencing 
normal colorectal cells provides quantitative insights into the genomic and clonal 


evolution of cancer. 


Sequencing of the genomes of over 20,000 cancers of several types has 
identified the repertoire of driver mutations in cancer genes that convert 
normal cells into cancer cells and revealed the mutational signatures 
of the underlying biological processes that generate somatic muta- 
tions”. Cancers are, however, end stages of an evolutionary process 
that operates within populations of cells, and commonly arise through 
the accumulation of several driver mutations that engender a series of 
clonal expansions. Understanding this progression has depended on 
the identification of somatic mutations in morphologically abnormal 
neoplastic proliferations that represent intermediate stages between 
normal cells and cancer cells! 

As for most cancer types, the earliest stages of progression to colo- 
rectal cancer remain less well understood. The driver mutation that first 
sets a colorectal epithelial cell onthe path to cancer is probably caused 
by mutational processes that also operate in normal cells and that we 
only understand to a limited extent. The nature and numbers of the 
earliest neoplastic clones with driver mutations—which conceivably 
are morphologically indistinguishable from normal cells—are simi- 
larly unclear. In large part, these deficiencies are due to the technical 
challenge of identifying somatic mutations in normal tissues, which 
are composed of myriad microscopic cell clones. Several different 
approaches have been adopted to address this challenge*™, and have 
revealed the signatures of common somatic mutational processes 
in normal cells of the small and large intestine, liver, blood, skin and 


nervous system. Thus far, however, studies have not been of sufficient 
scale to characterize variation in signature activity or detect processes 
that occur less frequently* “. High proportions of normal skin, oesopha- 
geal and endometrial epithelial cells have been shown to be members 
of clones that already carry driver mutations", and large mutant 
clones have been detected in the blood” *°. The extent of this phe- 
nomenon in the colon, an organ with a high incidence of cancer, has 
not been investigated. 

Colonic epithelium is a contiguous cell sheet organized into around 
15 million crypts, each of which is composed of about 2,000 cells”. 
Towards the base of each crypt, a small number of stem cells are 
found, which are ancestral to the maturing and differentiated cells 
in the crypt**. These stem cells stochastically replace one another 
through a process of neutral drift”*”*, such that all stem cells—and 
thus all cells—in a crypt derive from a single ancestral stem cell that 
existed in recent years” ”’. The somatic mutations that were present in 
this ancestor are thus found inall of the approximately 2,000 descend- 
ant cells and canbe revealed by DNA sequencing of an individual crypt. 
These stem cells are thought to be the cells of origin of colorectal 
cancers”®. To characterize the earliest stages of colorectal carcino- 
genesis, we examined somatic mutational burdens, mutational sig- 
natures, clonal dynamics and the frequency of driver mutations in 
normal colorectal epithelium by sequencing individual colorectal 
crypts. 
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Fig. 1| Mutational signatures that are present in normal colon. a, Examples of 


an SBS, a DBS and an ID signature (SBS18, DBS8 and ID1), showing the categories 
into which mutations are divided. Later figures are shown in the same format. 


Somatic mutations and mutational signatures 


We used laser-capture microdissection to isolate 2,035 individual colonic 
crypts from the normal epithelium of 42 individuals aged 11 to 78, of 
whom 15 had a history of colorectal cancer and 27 did not (Methods, 
Supplementary Table 1), and sequenced their genomes. The distribu- 
tion of variant allele fractions (VAFs) from whole-genome sequencing 
of 571 individual crypts showed that crypts were derived from a single 
ancestral stem cell (Extended Data Fig. 1d), and simulations indicated 
that about 90% of the mutations called were fully clonal (Supplementary 
Information). There was substantial variation in mutational burdens 
between individual crypts—for example, the mutational burden ranged 
from1,508 to 15,329 for individuals in their sixties—and this was not obvi- 
ously attributable to technical factors. To explore the biological basis 
of this variation we extracted mutational signatures and estimated the 
contribution of each to the mutational burden of every crypt (Methods, 
Supplementary Information). 

Nine single-base substitution (SBS), six doublet-base substitution 
(DBS) and five small insertion and deletion (indel) (ID) mutational 


Number of repeats 


Number of repeats Microhomology 


SBSS 


SBS2 | 
_ es | 


DBS6 


IDS 


IDB 


b, The complement of signatures in normal colonic epithelium. Known signatures 
are labelled according to their nomenclature in the Pan Cancer Analysis of Whole 
Genomes (PCAWG)’and novel signatures are labelled with letters. 


signatures were found. Of these, 14 closely matched (Methods) a known 
reference signature (SBS1, SBS2, SBS5, SBS13, SBS18, DBS2, DBS4, DBS6, 
DBS8, DBS9, DBSI11, ID1, ID2 and IDS; nomenclature as described previ- 
ously’) and six did not (SBSA, SBSB, SBSC, SBSD, IDA and IDB) (Fig. 1, 
Extended Data Figs. 2-4). Thus, new mutational signatures were 
extracted despite extensive previous analysis of cancers—perhaps owing 
to masking by the comparative complexity of the mixes of signatures 
that are present in cancer genomes. 


Ubiquitous mutational signatures 

Eleven signatures (SBS1, SBS5, SBS18, DBS2, DBS4, DBS6, DBS9, DBS11, 
ID1, ID2 and IDS) were found in over 85% of crypts and are here termed 
‘ubiquitous’. All have been previously described?. 

SBS1is characterized by C>T substitutions at NCG trinucleotides (the 
mutated base is underlined) and is probably a result of the deamina- 
tion of 5-methylcytosine. Its mutational load correlated linearly with 
age (Fig. 2). There was, however, variation in SBS1 mutational burdens 
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Fig. 2 | Mutational burden versus age for every signature. For every signature, the median (horizontal bar) and range (vertical bar) in mutational burden for all the 
crypts from each individual are shown. Each individual is coloured differently. n=445 crypts from 42 individuals. 


between crypts from the same individual (P= 2.25 x 10”). This was, in 
part, due to different SBS1 mutation rates in different sectors of the 
colon, with mean rates across individuals of 16.8 mutations per year 
(95% confidence interval, 15.2-18.3) in the right colon (ascending and 
caecum), 16.1(95% confidence interval, 14.4-17.5) inthe transverse colon 
and 12.8 (95% confidence interval, 11.1-14.4) in the left colon (descend- 
ing and sigmoid). The SBS1 mutation rate in the terminal ileum was 12.7 
(95% confidence interval, 10.6-14.9) (Supplementary Information). 
SBSS is a relatively featureless signature of unknown cause and SBS18 
is characterized by C>A mutations, which may be caused by damage to 
DNA by reactive oxygen species”’*’. The mutational burdens of these 
two signatures correlated with age, with the same ordering of sector 
differences as SBS1 (P= 9.89 10 ~ for SBS5; P=5.43x 10 ~ for SBS18). 
Even after taking anatomical location and age into account, differences 
in mutational burden remained between different crypts, notably for 
SBS18 (Fig. 2, Extended Data Fig. 9, Extended Data Fig. 6al). Combining 
ubiquitous SBS mutational signatures, and averaging over anatomical 
sites, the rate of mutation was 43.6 mutations per year, which is com- 
parable with previous estimates’. 

DBS2, DBS4, DBS6, DBS9 and DBS11 were tightly correlated in all 
colonic crypts. 1ID1,1D2 and IDS—which are characterized by indels ofa sin- 
gle T and may be the consequence of slippage during DNA replication—all 
accumulated linearly with age, with the same order of sector differences 
as SBS1 (P=1.66 x 10° for ID1, P=4.53 x10 for ID2 and P= 4.53 x 10° for 
IDS) (Supplementary Information, Extended Data Fig. 5). 

The correlations of ubiquitous signatures with age indicate that the 
mutational processes that underlie them operate throughout life, in 
all individuals and all colorectal stem cells. However, the results also 
suggest that differences in physiology and/or microenvironment (and, 
potentially, the age of the most recent common ancestor of the crypts”) 
between different sectors of the colon cause measurable differences in 
somatic mutation rates. 


Sporadic mutational signatures 


Nine signatures (SBS2, SBS13, SBSA, SBSB, SBSC, SBSD, DBS8, IDA and 
IDB) were present only ina subset of individuals and/or a subset of crypts 


534 | Nature | Vol574 | 24 OCTOBER 2019 


and are here termed ‘sporadic’. All were novel, except for SBS2, SBS13 
and DBSS8. SBS2 and SBS13 are characterized by C>T and C>G mutations 
at TCN, are probably caused by cytidine deaminases of the APOBEC fam- 
ily and usually occur together". They were present in only two crypts 
(a colonic crypt (Extended Data Fig. 6ai) and an ileal crypt (Extended 
Data Fig. 6ao) from different individuals), occurring together and each 
accounting for over 150 mutations. To our knowledge, this is the first 
report that DNA editing of the human genome by APOBEC cytidine 
deaminases occurs in normal cells in vivo. The sequence context of 
these mutations in normal colon suggests that APOBEC3A is the major 
contributing enzyme”. 

Four SBS signatures that do not match the reference set, SBSA, SBSB, 
SBSC and SBSD, were found in normal colorectal cells (SBSA has recently 
been reported in an oral squamous carcinoma”). SBSA is characterized 
by T>C mutations at ATA, ATT and TTT, and T>G mutations at TTT. Its 
mutational burden correlated closely with that of IDA, in which sin- 
gle T deletions in short runs of T bases (with a modal average of four) 
predominate—suggesting that these two signatures are a result of the 
same underlying mutational process. SBSA was detectable in 29 out of 
42 individuals and often accounted for thousands of mutations in just 
a subset of crypts. It clustered spatially in the colon, with crypts from 
the same biopsy exhibiting the signature even though the mutations 
themselves were not shared (Supplementary Information, Extended 
Data Fig. 9). Around 2.5-fold more T>C mutations occurred when the 
T base was on the transcribed than on the untranscribed strand. 
Transcriptional-strand bias is often caused by transcription-coupled 
nucleotide excision repair acting on DNA that has been damaged by 
exogenous exposures that cause covalently bound bulky adducts; how- 
ever, itcan also occur as a result of transcription-coupled DNA damage”. 
Assuming that one of these two possibilities is the case, damage to ade- 
nine underlies SBSA. To investigate the timing of SBSA, we constructed 
phylogenetic trees of mutations and established the mutational signa- 
tures in each branch (Fig. 3, Extended Data Fig. 6). SBSA was confined to 
early branches of these phylogenies (when these were available for analy- 
sis) (Fig. 3b, Extended Data Fig. 6f, h,z, aa, am, ao, aq). Using the number 
of SBS1 mutations as indicators of real time, the mutational process 
that underlies SBSA appears to be active before an individual reaches 
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dominated by ubiquitous signatures (a), SBSA and IDA (b), SBSB, DBS8 and 
IDB (c) or SBSD for the individual treated with chemotherapy (d). 


10 years of age (Supplementary Methods, Extended Data Fig. 6aq). SBSA 
may therefore be caused by an extrinsic, locally acting and patchily 
distributed mutagenic insult that occurs during childhood. 

SBSB was characterized by C>T mutations at ACA, T>A at CTN 
and T>G at GIG, and was present in subsets of crypts from four indi- 
viduals (Extended Data Fig. 6e, aa, ai, aj). It accounted for variable 
numbers of substitutions, with a maximum of 3,002 in one crypt 
(Fig. 3c, Extended Data Fig. 6ai). In the two individuals in whom SBSB 
could be timed (Extended Data Fig. 6aa, ai, aq), it appeared—as with 
SBSA-—to be most active in the first decade of life. SBSB correlated 
with DBS8 and IDB (Fig. 3c, Extended Data Fig. 9), suggesting that 
they are caused by the same underlying mutational process. DBS8 
is composed of AC>CA and AC>CT mutations and has previously 
been reported in rare hypermutated cancers with no obvious cause’. 
IDB is dominated by deletion of a single T that has no other T bases 
surrounding it. 

SBSC is characterized by one C>T mutation in CC dinucleotides. It 
primarily affected three crypts (with 1,050, 827 and 695 mutations, 
respectively) from the left colon of one individual with an unremarkable 
history (Extended Data Fig. 9, Extended Data Fig. 6m, Supplementary 
Table 1). 

All crypts from a 66-year-old man carried many thousands of muta- 
tions of SBSD (Fig. 3d, Extended Data Fig. 6ap), which is characterized 
by T>A substitutions with a transcriptional-strand bias that is compat- 
ible with damage to adenine. This individual had been treated with 
multiple chemotherapeutic agents (cyclophosphamide, doxorubicin, 
vincristine, prednisolone, chlorambucil, bleomycin and etoposide) 
for lymphoma and subsequently developed caecal adenocarcinoma. 
SBSD resembles SBS25 (cosine similarity, 0.9), which has previously 
been found in Hodgkin’s lymphoma cell lines from two patients who 
were treated with chemotherapy””*. To our knowledge, this is the first 
time that the mutational consequences of chemotherapy have been 
demonstrated in normal human cells in vivo. The mutational burdenin 
the colorectal epithelium of this individual was three- to fivefold higher 
than that expected for his age—thus by extrapolation equivalent to that 
of a200-300-year-old. 


Copy-number changes and structural variants 


Copy-number changes and/or structural variants were found in 80 out 
of 449 (18%) crypts that could be evaluated. Five crypts exhibited eight 
whole-chromosome copy-number increases that affected the same 
three chromosomes-3, 7 and 9—as well as the X chromosome (Extended 
Data Fig. 7a). Thus, copy-number increases clustered in certain crypts 
and tended to affect certain chromosomes. No whole-chromosome 
losses were observed. Arm-level copy-number increases that affect 
chromosome 7 are common in colorectal cancers” and adenomas*.. 
Copy-number increases in chromosomes 3 and 9 are seen in colorec- 
tal cancers, but are almost as frequently deleted”. Copy-neutral loss 
of heterozygosity (CN-LOH) was observed in 12 crypts, and affected 
chromosomes Ip, 6p, 7p, 8q, 9q, 10q (twice), 17p, 17q, 18q, 21q and 22q 
(Extended Data Fig. 7c). CN-LOH is frequently observed in colorectal 
cancers, although the specific changes that we observe here are not 
recurrent features”’. Five copy-number changes could be timed and all 
were estimated to have occurred in adulthood (Extended Data Fig. 7b). 
Two changes that affected the same crypt appeared to be synchro- 
nous (Supplementary Information). An analysis of structural variants 
detected 48 large deletions, 18 tandem duplications, 4 translocations 
and 2 inversions (Extended Data Fig 7d, Supplementary Information). 
Each structural variant was restricted to a single crypt, except for one 
deletion that was present in two adjacent crypts that share few muta- 
tions, indicating that it occurred during gestation or early childhood. 


Driver mutations 


Driver mutations are those that confer a selective advantage during 
cancer evolution. To search for driver mutations in normal colon, the 
whole-genome sequences of 571 crypts were supplemented with tar- 
geted sequencing of 90 knowncolorectal cancer genes (Supplementary 
Table 4) in additional crypts. In total, substitutions in these genes could 
be evaluated in 1,403 crypts and indelsin 1,046. Statistical analysis pro- 
vided evidence of positive selection on the recessive cancer genes AXIN2 
(three truncating mutations; adjusted g value, 0.004) and STAG2 (two 
truncating mutations; adjusted q value, 0.038) indicating that these 
mutations are probably drivers. Additional mutations that are likely to be 
drivers were identified in cancer genes with canonical missense hotspot 
mutations. Nine hotspot mutations in P/K3CA (E542K, R38H), ERBB2 
(R678Q, V8421, T862A), ERBB3 (R475W, R667L) and FBXW7 (RSO5C, 
R658Q) were observed (Extended Data Fig. 8). Given the specificity 
of these hotspot mutations, most are likely to be drivers. In addition, 
heterozygous truncating mutations were found in the recessive cancer 
genes ARID2, ATM (two), ATR, BRCA2, CDK12 (two), CDKNIB, RNF43 (two), 
TBLIXR1 and TP53 (Supplementary Table 5). There was no statistical 
evidence for selection of truncating mutations in the set of 90 colorectal 
cancer genes overall. The possibility that some have conferred a growth 
advantage, however, is not excluded. None of the analysed crypts carried 
more than one putative driver mutation. 

Twenty-three pairs of adjacent crypts shared over 100 SBS1 mutations 
and thus were probably generated by postnatal fission of crypts. Two 
pairs carried driver mutations (one nonsense mutation in AXIN2 and 
one E542K in PIK3CA), although the association of driver mutations with 
crypt fission is not significant (P= 0.17). In one sister crypt the AXIN2 
mutation was rendered homozygous by CN-LOH of chromosome 17q, 
which demonstrates that clonal evolution is ongoing in normal colon 
(Figs. 4, 3b). 

On the basis of the conservative assumption that just the AX/N2 and 
STAG2 truncating mutations and the missense hotspot mutations in 
PIK3CA, ERBB2, ERBB3 and FBXW7 are drivers, around 1% of normal 
colorectal crypts (approximately 150,000 crypts) in a50-60-year-old 
individual (the mean age of crypts that were assessed for driver muta- 
tions in our cohort was 53 years) carry a driver mutation. As around 40% 
of people over 70 years old have an adenoma on colonoscopy” and 
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Fig. 4 | Aninactivating mutation in AXIN2. a, Asection after dissection. Red 
dots represent crypts with the AX/N2 mutation; blue dots represent those 
without it. Crypts without dots were not sequenced successfully. b, Crypts with 
the mutation appeared no different to others. c, CN-LOH of one crypt over the 
AXIN2 locus. The copy-number state (y axis) for every chromosome is shown, 
with one allele coloured red and the other green. d, JBrowse image of reads that 
support the AX/N2 mutation. The mutation is red. Of 29 reads, 25 support the 
mutation inthe crypt with CN-LOH; the 4 that do not presumably represent 
stromal contamination. 


around 5% of people develop colorectal cancer over their lifetime (and 
some of these may arise from more recently acquired driver mutations) 
only an extremely small proportion of these crypt microneoplasms 
becomea macroscopically detectable adenoma (less than 1in 375,000) 
or carcinoma (less than 1in3 million) within the following few decades. 


Clonal dynamics ofnormal epithelium 

The distribution of allele fractions of mutations within the crypt informs 
onthe dynamics of turnover of the crypt stem cells. We estimate that 
the average time since the most recent common ancestor of crypts is 
5.5 years (95% confidence interval, 1-10.5), which is similar to previous 
estimates. Our data are compatible with previous estimates of 7 active 
stem cells and 1.3 stem cell replacements per year“, or 5stem cells and 
0.6 stem cell replacements per year’, but we cannot exclude the pos- 
sibility of a larger number of stem cells that turn over more frequently 
(Extended Data Fig. 10, Supplementary Information). The microdis- 
section approach also enabled us to investigate the clonal structure of 
colonic epithelium beyond the crypt. By comparing the genetic relat- 
edness of crypts with their spatial relatedness, we estimate that crypts 
undergo fission at a mean rate of once every 27 years (95% confidence 
interval, 15.9-47.6) (Extended Data Fig. 10, Supplementary Information). 


Comparisons with colorectal cancer 


There are marked differences between the genomes of normal colorectal 
stem cells and those of colorectal cancers. The total mutational burdens 
of substitutions (10,000-20,000) and indels (1,000-2,000) that are 
found in most colorectal carcinomas? (excluding those with hypermuta- 
tor phenotypes, in which the burden is usually more than 10-fold higher) 
are higher than the approximately 3,000 substitutions and 300 indels 
that are found in most normal crypts from 50-60-year-old individuals 
(Extended Data Fig. 11a). These differences may be underestimated, as 
the most recent common ancestor of cancers probably predates that of 
normal crypts. The high mutational burdens and associated mutational 
signatures of DNA mismatch repair deficiency and/or mutations in DNA 
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polymerase ¢ or 6 were not found in any normal colorectal crypts, but 
are present in around 20% of colorectal cancers. Equally striking is the 
difference between the 0-4 structural changes per normal crypt (with 
the majority having none; Supplementary Information) and the tens to 
hundreds per colorectal cancer*. In all of these respects, the genomes 
of normal crypts with driver mutations were similar to those of normal 
crypts without drivers (Extended Data Fig. 9). 

There was no difference in the burden of either sporadic or ubiqui- 
tous mutational processes between the crypts of individuals with and 
without a colorectal cancer (Supplementary Information). If differences 
in mutational processes in normal cells do underlie why some people 
develop colon cancer and others do not, these mutational processes 
must affect only asmall proportion of crypts inthe colon, or only exert 
subtle effects on the rate of mutation, such that we could not detect 
differences between the two groups. The increased base substitution 
and indel mutation loads in cancers are due to a combination of higher 
burdens of the ubiquitous mutational signatures found in normal crypts, 
additional signatures thus far found exclusively in cancers (confirming 
previous reports*“*) and more copy-number changes and structural vari- 
ation (Extended Data Fig. 11a). The causes of some of these additional 
mutations in cancer are known (for example, defective mismatch repair 
and mutations in DNA polymerase ¢ or 8) but the majority are uncertain. 

The relative frequencies of mutated cancer genes differ between colo- 
rectal adenomas or carcinomas and normal colorectal cells (P=0.003; 
Supplementary Information, Extended Data Fig. 11a). Mutations in APC, 
KRAS and TP53are commonin colorectal cancer” —accounting for 56% of 
base-substitution and indel driver mutations (Supplementary Methods)— 
but comparatively rare among normal crypts with driver mutations 
(1in 14). By contrast, mutations in, for example, FRBB2 and ERBB3 are 
common in normal crypts with driver mutations (5 in 14), but rare in 
colorectal cancer (7 in 631). In the case of APC (but not KRAS, and per- 
haps not 7P53), biallelic inactivation may be required to confer a strong 
growth advantage; this helps to explain why APC may be mutated less 
frequently in normal colon than ERBB2 or ERBB3 (which require a single 
hit to do so). The results suggest that mutations in APC, KRAS and TP53 
confer higher likelihoods of conversion to adenoma and carcinoma 
than mutations in FRBB2 and ERBB3, whereas the latter confer higher 
likelihoods of crypt colonization by stem cells. There was no detectable 
difference in the frequency of driver mutations between individuals 
in our cohort who had colorectal cancer and those who did not (Sup- 
plementary Information). 


Discussion 

This study has characterized all classes of somatic mutation in hundreds 
of normal colorectal epithelial stem cells. Our experimental design 
allows us to gaininsights into different facets of the earliest stages of the 
clonal evolution of colorectal cancers; namely, the range of mutational 
processes, the frequency of driver mutations and the clonal dynamics 
of colonic stem cells. 

A substantial repertoire of base-substitution and indel mutational 
processes is operative (some of which are ubiquitous and some spo- 
radic), together with relatively infrequent copy-number changes and 
genome rearrangements. DNA editing by APOBEC cytidine deaminases 
occursin normal colon, albeit only rarely. Many signatures, however, are 
of unknown aetiology and some appear to be acquired early in life. The 
presence of five times the age-standard mutational load in all colorectal 
cells—and potentially many other tissues—in an individual who had 
undergone chemotherapy provides new insight into the effect of such 
exposures, and raises questions pertaining to the relationship between 
mutational load and the relatively modest effect of chemotherapy on 
cancer risk*. 

Herein, we have revealed the earliest stages of colorectal cancer 
development. They are characterized by numerous crypts that carry 
driver mutations, of which only a very small fraction ever manifest as 


macroscopic neoplasms. Certain mutated cancer genes appear to foster 
this pervasive and invisible wave of microneoplastic change, whereas 
others particularly engender progression to colorectal adenoma and 
cancer. The conversion of these early microneoplasms to more advanced 
stages of colorectal neoplasia is associated with the acquisition of 
increased mutational loads that are composed of base substitutions, 
indels, structural variants and copy-number changes. More extensive 
studies of colorectal epithelium will enable the rarer intermediate stages 
between these early clones and small adenomas to be characterized, 
and will refine our understanding of the development of the subset of 
microneoplasms that are more likely to become carcinomas. 

The proportion of normal colorectal epithelial cells with driver muta- 
tions (1%) is, however, substantially lower than that of other normal tis- 
sues so far studied—notably skin (30%)"° and oesophagus (over 50%)"*. 
This may be, at least in part, aconsequence of the modular structure 
of glandular epithelia. The small number of stem cells within a crypt 
diminishes the probability that a cell with a driver mutation will out- 
compete its wild-type neighbours. Moreover, even if it does colonize 
the crypt, a mutant stem cell is entombed in it unless the cell can over- 
come the largely unknown forces that control clonal expansion through 
crypt fission. The lower burden of driver mutations in colon relative to 
endometrium» (which is also glandular) remains to be investigated. 

Fundamental questions are being addressed with respect to differ- 
ences in the incidence rates of cancer between tissues. The somatic 
mutational burden in colon and ileum is similar despite the substantially 
higher incidence of cancer in colon (as previously noted*), and there- 
fore does not appear to account for this difference. Whether the total 
burden of microneoplastic change across the colon and in other tissues 
more closely correlates with these differences is yet to be determined. 

Finally, this study provides a reference perspective on the mutational 
signatures and driver mutations in normal colon, against which disease 
states of inflammatory, genetic, neoplastic, degenerative and other 
aetiologies can be compared. Similar surveys conducted across the 
range of normal cell types will inform on the universal process of somatic 
evolution in the human body in health and disease. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Human tissues 

We obtained healthy colonic biopsies from four cohorts (Supplementary 
Table 1). The first represents seven deceased organ donors ranging in 
age from 36 to 67, from whom colonic and small-intestinal biopsies were 
taken at the time of organ donation (REC 15/EE/0152). The second rep- 
resents individuals aged 60 to 72 who were having a colonoscopy after 
a positive faecal occult blood test as part of the bowel cancer screening 
programme (Ethical approval O8-HO308-13); we selected 16 individu- 
als who were not found to have either an adenoma or a carcinoma on 
colonoscopy, and 15 who were found to have a colorectal carcinoma 
(the normal biopsies that we used were distant from these lesions). 
The third cohort represents three paediatric patients who underwent 
routine colonoscopy to exclude inflammatory bowel disease and who 
were found to havea completely normal intestinal mucosa macroscopi- 
cally and histologically (REC 12/EE/0482). The final cohort comprised 
one 78-year-old man with oesophageal cancer who underwent a rapid 
autopsy (REC 13/EE/0043). This individual had been treated with pal- 
liative chemotherapy of epirubicin, oxaliplatin and capecitabine within 
the three months before death; given that monoclonal conversion 
within crypts takes place on the scale of years, mutations caused by 
these chemotherapies are likely to be restricted to a small proportion 
of stem cells per crypt and so unlikely to be detected. All samples were 
obtained with informed consent and the studies were approved by East 
of England Research Ethics Committees. 


Laser-capture microdissection of colonic crypts 

Fresh frozen biopsies were embedded in optimal cutting temperature 
compound. Sections of 30 um were fixed in methanol for 5 min, washed 
three times with phosphate-buffered saline and stained with Gill’s hae- 
matoxylin for 20 s. Crypts were isolated by laser-capture microdissec- 
tion, and collected in separate wells of a 96-well plate. They were lysed 
using the Arcturus PicoPure Kit (Applied Biosystems) according to the 
manufacturer’s instructions. DNA library preparation then proceeded 
without clean-up or quantification. 


Library preparation 

Twolibrary-preparation methods were used for laser-capture microdis- 
sected material: in initial experiments, sonication was used to fragment 
DNA; and later, an enzymatic fragmentation method was implemented 
as it could make libraries from even lower input. Comparison of the two 
methods showed no difference in mutation calls once post-processing 
filters (described below) had been implemented. All samples in this 
study were processed using an Agilent Bravo Workstation (Option B; 
Agilent Technologies). 

For sonication libraries, lysate from laser-capture microdissection 
(20 pl) was mixed with 100 pl TE buffer 1(0 mM Tris-HCl, 1 mM EDTA) 
(Ambion) and DNA was fragmented using focused acoustics (Covaris 
LE220; Covaris, Inc.). Fragmented DNA was mixed with 80 pl Ampure XP 
beads (Beckman Coulter). After a5-min binding reaction and magnetic 
bead separation, genomic DNA was washed twice with 75% ethanol. 
Beads were resuspended in 20 pl nuclease-free water (Ambion) and pro- 
cessed immediately for DNA library construction. Each sample (20 pl) 
was mixed with 2.8 pl NEBNext Ultra II End Prep Reaction Buffer and 
1.25 pl NEBNext Ultra II End Prep Enzyme Mix (New England BioLabs), and 
incubated ona thermal cycler for 30 min at 20 °C then 30 min at 65 °C. 
Following DNA fragmentation and A-tailing, each sample was incubated 
for 20 min at 20 °C witha mixture of 30 pl ligation mix and 1 pl ligation 
enhancer (New England BioLabs), 0.9 pil nuclease-free water (Ambion) 


and 0.1 pI duplexed adapters (100 uM; 5’-ACACTCTT TCCCTACACGAC 
GCTCTTCCGATC*T-3’, 5’-phos-GATCGGAAGAGCGGT TCAGCAGGAATG 
CCGAG-3’). Adapter-ligated libraries were purified using Ampure XP 
beads by addition of 65 pl Ampure XP solution (Beckman Coulter) and 
65 pl TE buffer (Ambion). After elution and bead separation, DNA librar- 
ies (21.5 pl) were amplified by PCR by addition of 25 tl KAPA HiFi HotStart 
ReadyMix (KAPA Biosystems), 1 p11 PE1.0 primer (100 LM; 5’-AATGAT 
ACGGCGACCACCGAGATCTACACTCTT TCCCTACACGACGCTCTTCCGA 
TC*T-3’) and 2.5 pl iPCR-Tag (40 pM; 5’-CAAGCAGAAGACGGCATACG 
AGATXGAGATCGGTCTCGGCAT TCCTGCTGAACCGCTCTTCCGATC-3’), 
in which ‘X’ represents one of 96 unique 8-base indexes The sample 
was then mixed and thermal-cycled as follows: 98 °C for 5 min, then 12 
cycles of 98 °C for 30 s, 65 °C for 30 s, 72 °C for 1 min and finally 72 °C 
for 5 min. Amplified libraries were purified using a 0.7:1 volumetric 
ratio of Ampure Beads (Beckman Coulter) to PCR product and eluted 
into 25 pl nuclease-free water (Ambion). DNA libraries were adjusted 
to 2.4 nMand sequenced on the HiSeq X platform (Illumina) according 
to the manufacturer’s instructions, with the exception that we used 
iPCR-Tag (5’-AAGAGCGGT TCAGCAGGAATGCCGAGACCGATCTC-3’) 
to read the library index. 

For enzymatic fragmentation, lysate from laser-capture microdissec- 
tion (20 pl) was mixed with 50 pl Ampure XP beads (Beckman Coulter) 
and 50 ul TE buffer (10 mM Tris-HCl, 1 mM EDTA) (Ambion) at room 
temperature. After a 5-min binding reaction and magnetic bead sepa- 
ration, genomic DNA was washed twice with 75% ethanol. Beads were 
resuspended in 26 pl TE buffer and the bead-genomic DNA slurry was 
processed immediately for DNA library construction. Each sample 
(26 pl) was mixed with 7 pl SX Ultra lI FS buffer and 2 ul Ultra Il FS enzyme 
(New England BioLabs), and incubated onathermal cycler for 12 min at 
37 °C then 30 min at 65 °C. Following DNA fragmentation and A-tailing, 
each sample was incubated for 20 min at 20 °C with a mixture of 30 pl 
ligation mix and 1 pl ligation enhancer (New England BioLabs), 0.9 pl 
nuclease-free water (Ambion) and 0.1 pl duplexed adapters (100 pM; 
5’-ACACTCTTTCCCTACACGACGCTCTTCCGATC*T-3’, 5’-phos-GATCGG 
AAGAGCGGTTCAGCAGGAATGCCGAG-3’). Adapter-ligated libraries 
were purified using Ampure XP beads by addition of 65 4! Ampure XP 
solution (Beckman Coulter) and 65 pI TE buffer (Ambion). After elution 
and bead separation, DNA libraries (21.5 pl) were amplified by PCR by 
addition of 25 pl KAPA HiFi HotStart ReadyMix (KAPA Biosystems), 1 pI 
PE1.0 primer (100 1M; 5’-AATGATACGGCGACCACCGAGATCTACACTC 
TTTCCCTACACGACGCTCTTCCGATC*T-3’) and 2.5 pl iPCR-Tag (40 pM; 
5’-CAAGCAGAAGACGGCATACGAGAT XGAGATCGGTCTCGGCAT TCCT 
GCTGAACCGCTCTTCCGATC-3’), in which ‘X’ represents one of 96 unique 
8-base indexes. The sample was then mixed and thermal-cycled as fol- 
lows: 98 °C for 5 min, then 12 cycles of 98 °C for 30s, 65 °C for 30s, 72 °C 
for 1 min and finally 72 °C for 5 min. Amplified libraries were purified 
using a 0.7:1 volumetric ratio of Ampure Beads (Beckman Coulter) to 
PCR product and eluted into 25 pI nuclease-free water (Ambion). DNA 
libraries were adjusted to 2.4 nM and sequenced on the HiSeq X plat- 
form (Illumina) according to the manufacturer’s instructions, with the 
exception that we used iPCR-Tag (5’-AAGAGCGGT TCAGCAGGAATG 
CCGAGACCGATCTC-3’) to read the library index. 


Whole-genome sequencing 

We generated paired-end sequencing reads (150 bp) using Illumina 
XTEN® machines, resulting in a coverage of around 15x per sample. In 
94% of the whole-genome-sequenced crypts that were included for 
statistical analysis, over 90% of the callable genome was covered by 
more than 10 reads. Sequences were aligned to the human reference 
genome (NCBI build37) using BWA-MEM. 


Targeted sequencing 

A2.3-Mb capture panel was designed in-house to pull down genes that 
are known or suspected to play arole in neoplasia. We performed custom 
RNA bait design following the manufacturer’s guidelines (SureSelect; 


Agilent). Samples were multiplexed on flow cells and subjected to paired- 
end sequencing (75-bp reads) using Illumina HiSeq2000 machines. 
One 96-well plate of samples was sequenced on each lane, but as tissue 
recovery was variable, a range of coverage was achieved. Sequences were 
aligned to the human reference genome (NCBI build37) using BWA-align. 


Calling substitutions 

Substitution calling was broken down into three steps: mutation discov- 
ery; filtering to produce a list of clean sites; and genotyping, in which 
the presence or absence of every mutationin every sample is evaluated. 

First, mutations were discovered using the cancer variants through 
expectation maximization (CaVEMan) algorithm**. CaVEMan uses a 
naive Bayesian classifier to derive the probability of all possible geno- 
types at each nucleotide. CaVEMan copy-number options were set to 
major copy number Sand minor copy number 2 for normal clones, asin 
our experience this maximizes sensitivity. The algorithm was run using 
an unmatched normal to be able to derive phylogenies: had another 
sample from the same individual been treated as a matched normal, 
early embryonic mutations would have been treated as germline and 
discarded, resulting in incorrect trees. 

Second, a number of post-processing filters were applied. These 
included filtering against a panel of 75 unmatched normal samples 
to remove common single-nucleotide polymorphisms (SNPs), post- 
processing as described previously” and two filters (only applied to 
whole-genome-sequencing data) designed to remove mapping artefacts 
associated with BWA-MEM: the median alignment score of reads sup- 
porting a mutation should be greater than or equal to 140, and fewer than 
half of these reads should be clipped. The library-preparation protocol 
for microbiopsies produced shorter library insert sizes than standard 
methods. Reads could therefore overlap, resulting in double counting of 
mutant reads. Fragment-based statistics were generated to prevent the 
calling of variants that were supported by a low number of fragments. 
Variants were annotated by ANNOVAR“ and fragment-based statis- 
tics (fragment coverage, number of fragments supporting the variant, 
fragment-based allele fraction) were calculated for each variant after the 
exclusion of marked PCR duplicates. In the rare event of discordance in 
the called base at the variant position between overlapping paired-end 
reads, the base with the highest-quality score was selected. Fragment- 
based statistics were calculated separately for high-quality fragments 
(alignment scores greater than or equal to 40 and base scores greater 
than or equal to 30). Variants that were supported by at least three-high 
quality fragments were retained and used for the next stage of variant 
filtering. Inspection of variants specific to laser-capture microdissec- 
tion experiments revealed that the vast majority were present within 
inverted repeats capable of forming hairpin structures, that they were 
supported by reads with very similar alignment start position (and 
so not marked as PCR duplicates) and were primarily located close to 
the alignment start within the supporting reads. These variants com- 
monly coincided with other proximal variants (1-30 bp), but filtering 
based on variant proximity would also remove actual kataegis events. In 
silico modelling of the potential hairpin showed that the variants were 
aligning to each other in the stem of the structure, but could not forma 
base pair, whereas all other bases could. The artefacts are probably the 
consequence of erroneous processing of cruciform DNA (either exist- 
ing before DNA isolation or formed during library preparation) by the 
enzymatic digestion protocol applied. We considered modelling the 
hairpin structures to filter these variants, but given the fact that read 
clustering (i.e., similar alignment position) serves as a hallmark for these 
artefacts, we opted to use the proximity of the variant to the alignment 
start, and the standard deviation (s.d.) and median absolute deviation 
(MAD) of the variant position within the supporting reads, as features 
for filtering. These statistics were calculated separately for reads aligned 
to positive and negative strands. In cases in which the variant was sup- 
ported by alownumber of reads (i.e., 0-1 reads) for one of the strands, 
the filtering was based only on the statistics generated for the other 


strand. Per variant, if one of the strands had too few supporting reads, 
it was required for the other strand that either: i) there should be 90% or 
more supporting reads to report the variant within the first 15% of the 
read starting from the alignment start; or ii) the statistics MAD > 0 and 
s.d. > 4. Per variant, if both strands were supported by sufficient reads 
it was required for both strands separately that: i) there should be 90% 
or more supporting reads to report the variant within the first 15% of the 
read; ii) the statistics MAD >2 and s.d. >2; or iii) the other strand should 
have the statistics MAD >1ands.d.>10 (i.e., the variant is retained ifthe 
other strand demonstrates strong measures of variance). In our experi- 
ence, the proposed strategy greatly reduces the number of artefactual 
variants while retaining all other variants—as assessed by running the 
last filtering step on whole-genome-sequencing data from experiments 
that were not laser-capture microdissections. 

Third, mutations were genotyped in every sample. A pile-up of all the 
samples froma given individual was constructed, in which the numbers 
of mutant and wild-type reads in every sample over every site that had 
been called in any sample from that person were counted. Only reads 
with a mapping quality of 30 or above and bases with a base quality of 
30 or above were counted. After applying these filters, mutations were 
genotyped on the basis of the number of mutant and wild-type reads 
at each locus. Mutations were called on the basis of a VAF greater than 
0.2, a depth greater than 7 and at least 4 mutant reads. If the depth over 
alocus was less than 7 in a given sample, or if there was more than one 
mutant read but the other criteria were not met, the genotype was set to 
NA (not applicable) for tree-construction purposes. Loci that were set to 
NAinmorethan one third of the samples were removed for construction 
of the phylogeny. Positions were called as germline if they were either 
called as present or NA inall of the samples from a given individual. 

Around 1.2% of all mutations were present in the coding regions of the 
genome. All mutations in coding regions are provided (Supplementary 
Table 3). 


Calling short indels 

As for substitutions, calling of indels was broken down into mutation 
discovery, filtering and genotyping. Mutations were called with the 
Pindel algorithm‘ using an unmatched normal. Post-processing filters 
were applied as described previously”, and the number of mutant and 
wild-type reads was tabulated as above. The same dataset-specific filters 
were applied as for substitutions. Indels were then genotyped on the 
basis of a VAF greater than 0.2, a depth of at least 10 and support of at 
least 5 mutant reads. 


Calling structural variants 

Genomic rearrangements were called using the BRASS algorithm 
(https://github.com/cancerit/BRASS). Abnormally paired read pairs 
from whole-genome sequencing were grouped and filtered by read 
remapping. Read-pair clusters for which more than 50% of the reads 
mapped to microbial sequences were removed, as were rearrange- 
ments for which the breakpoint could not be reassembled. Candidate 
breakpoints were matched to copy-number breakpoints defined by 
ASCAT (see below) within 10 kb. Only structural variants in which the 
two breakpoints were more than 1,000 base pairs apart were consid- 
ered. Structural variants were called against a matched normal skin or 
blood sample when available and against another crypt from the same 
individual with good coverage when not. 


Calling copy number 

Copy-number changes were called using the allele-specific copy number 
analysis of tumours (ASCAT) algorithm*. The same matched normal 
sample was used as for calling structural variants. For additional valida- 
tion of copy-number changes in normal colon, the QDNAseq algorithm™ 
was run. ASCAT uses both the read depth and ratios of heterozygous 
SNPs to determine an allele-specific copy number, whereas QDNAseq 
relies solely on variations in sequencing coverage. To call amplifications 
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and deletions in the colonic microbiopsy cohort, only those that both 
were called by ASCAT and showed aclear departure from the background 
log,-transformed ratio by QDNAseq were retained. To call CN-LOH in 
this cohort, all such events called by ASCAT were checked visually on 
JBrowse™ to verify an imbalance of parental SNPs. Only crypts witha 
coverage of more than 10x, for which copy-number changes could be 
reliably detected, were used. 


Detection of driver variants and positive selection 

Driver mutations were detected through both an unbiased dN/dS 
method and manual annotation. For these analyses, the CaVEMan and 
Pindel calls were used without post-processing filters (such as requiring 
a VAF cut-off of greater than 0.2) to maximize our sensitivity. All putative 
driver variants were visually inspected using JBrowse™, thus we could 
afford a higher false-positive rate in the mutation discovery phase. 

The statistical model dNdScv™ was used to conduct three tests: first, 
using only the whole-genome-sequencing data, an analysis of selection 
over all genes; second, using combined whole-genome and targeted- 
sequencing data, over all the genes covered by the bait set; and finally, 
again using this combined dataset, over 90 selected cancer genes (Sup- 
plementary Table 4, Supplementary Information). 

Manual annotation of driver variants on the basis of previous knowl- 
edge complemented this. A list of 90 colorectal cancer genes (appendix) 
curated from the literature that were also covered by the bait set were 
intersected with the list of substitutions and indels from combined 
whole-genome and targeted sequencing. Mutations were annotated 
as putative drivers either if they were missense mutations that fell in an 
oncogene hotspot (on the basis of visualization of the distribution of 
mutations in the gene on COSMIC®), or if they were truncating muta- 
tions that fell ina tumour suppressor gene. 

Structural variants that might act as drivers were assessed by intersec- 
tion of genes involved in each structural variant with the twelve genes 
involved in gene fusions that have been reported in colorectal cancer 
in COSMIC (VTIIA, TCF7L2, TPM3, NTRK1, PTPRK, RSPO3, ETV6, NTRK3, 
FIF3E, RSPO2, C2orf44 and ALK). No fusion genes were found. None of 
the genes involved in structural variants in our data overlapped with 
the list of 90 cancer genes used for assessing substitutions and indels, 
and there were no genes that were affected by more than one structural 
variant. No high-level copy-number amplifications were observed and 
there were no homozygous deletions. 

Note that the frequency of driver mutations is low and as such we 
cannot estimate a per-gene driver frequency. All we can do to derive 
a meaningful estimate is to pool our driver mutations. In addition, 
coverage may fluctuate even within a gene. Some portions of a gene 
may be well-covered in 1,000 crypts, and others in 2,000 crypts. The 
approach that we took was to calculate, for the average exonic base pair 
in our 90 cancer genes, the number of crypts in which that base pair 
was covered by 8 or more reads (for substitutions) and by 10 or more 
reads (for indels). Of all bases in the targeted panel across all crypts, 64% 
are covered by 8 or more reads, which equates to a number of callable 
bases equivalent to having sequenced about 1,400 crypts with perfect 
coverage over every base in every crypt. This average number of crypts 
in whichall base pairs achieve good coverage becomes the denominator 
for calculating the frequency of driver mutations (with the number of 
drivers observed in the dataset as the numerator). A similar approach 
can be taken with indels. Our estimate of 1% uses a global correction, on 
the assumption that under-representation and over-representation will 
even itself out when estimating the total frequency of driver mutations 
in the whole dataset. 


Estimation of the frequency of driver mutations in cancer 

Publicly available colorectal cancer mutation calls were obtained from 
The Cancer Genome Atlas (TCGA) network”. Driver mutations were 
annotated manually in the same way as in our dataset: only mutations 
that fell inthe 90 genes that we had selected were considered, and they 


were annotated as putative drivers either if they were missense muta- 
tions that fell in an oncogene hotspot (on the basis of visualization of 
the distribution of mutations in the gene on COSMIC*), or if they were 
truncating mutations that fell ina tumour suppressor gene. 


Construction of phylogenies 

Phylogenies are used in this analysis for timing mutations. The most 
informative branches in this case are the long branches shared by asmall 
number of crypts, which are very robust to all methods of tree construc- 
tion. Trees were built using maximum parsimony, with substitutions 
called as described above. For every individual, the input matrix of muta- 
tion calls was bootstrapped 100 times. Phylogenies were constructed 
for each replicate using the Wagner method of the Mix programme from 
the PHYLIP suite of tools. The consensus phylogeny was constructed 
from 100 bootstrap runs using the extended majority rule method for 
the Consense programme from the PHYLIP suite of tools™. 

Across all phylogenies, a mean of 10% and a median of 1.5% of muta- 
tions per tree did not fit the trees perfectly. Phylogenies with more 
crypts had more mutations that fitted imperfectly. Consider a muta- 
tion that is actually present in 50 crypts. Even with 15x coverage over 
the site in every sample, and with every crypt completely clonal, if we 
simulate resampling of mutant reads from the binomial distribution 
(with size of 15 and probability of 0.5), 17% of the time the mutation will 
have fewer than the 3 reads required to call it in at least one sample. 
Variation in sequencing depth, clonality and sequencing errors would 
further decrease the probability of calling the mutation perfectly in 
every sample. Nodes across all our phylogenies had mean bootstrap- 
ping values of 0.77 and median bootstrapping values of 0.99. Branches 
at the very top of the phylogenies, which probably represent embryonic 
cell divisions, are supported by only a few mutations and have lower 
support because in a given bootstrap sample the couple of mutations 
that support this node may be omitted. Longer shared branches almost 
always have bootstrapping values of 1. These longer shared branches 
are those that are most important to our analyses, because they are the 
most informative when timing mutational signatures relative to one 
another and because they represent postnatal crypt fission events. To 
further increase our confidence in our phylogenies, we validated them 
by reconstructing them with indels. To do this, the same procedure as 
for substitutions was followed for indel matrices. As there were fewer 
indels than substitutions, nodes in indel phylogenies were generally 
reconstructed with lower confidence than in substitution phylogenies, 
but they broadly agree. Of the nodes reconstructed with 90% or greater 
confidence in the indel tree, 85% were present with exactly the same set 
of descendants in the substitution trees. Any errors in the phylogenies 
should be relatively minor and not affect our downstream analyses. 

The program that was used for inferring phylogeny provided the 
topology of the tree but not the assignment of mutations. Mutations 
from the input matrix of genotypes therefore have to be reassigned to 
branches. To assign a set of mutation calls with no false negative and no 
false positives to a tree, each branch of the tree was considered inturn. 
Ifa mutation was called in all the descendants of a given branch, and in 
no samples that were not descendants of the branch, mutations were 
assigned to that branch. 

Some colonic microbiopsies had low coverage and stromal contamina- 
tion. For this reason, we did not expect mutations to fit the tree perfectly, 
as a mutation that was truly present in a colony might be missed if too 
few supporting reads are found. Mutations were only assigned to the 
tree to determine the mutational processes active at a particular time. 
We reasoned that it was preferable to assign only mutations that fit the 
tree perfectly and adjust the branch lengths based on the power to call 
mutations at a given branch, rather than attempting to assign muta- 
tions that fit the tree imperfectly. Using the clonality and coverage of 
all descendants of a branch, the proportion of true substitutions or 
indels on the branch that would be first discovered (whether by CaVE- 
Man or Pindel) and then genotyped as present according to the criteria 


described above was calculated. The observed branch length was then 
adjusted by dividing by this proportion. Adjustment proportions can be 
found in Supplementary Table 7. This was done for both substitutions 
and indels, but not for structural variants and for larger copy-number 
changes, owing to a lack of data: most branches have no large variants 
and so could not be extended appropriately. Rearrangements and copy- 
number changes were assigned to phylogenies manually. 


Extraction of mutational signatures 

Mutational signatures were extracted using the mutations assigned to 
every branch of a phylogeny as a‘sample’. This allows better discrimina- 
tion of mutational processes that may occur at different times within 
the same cell. Mutations were categorized following the method used 
by the Mutational Signatures working group of the PCAWG”’. Single- 
base substitutions were categorized into 96 classes according to the 
identity of the pyrimidine-mutated base pair, and the base S’ and 3’ to 
it. Doublet-base substitutions were categorized into 78 classes accord- 
ing to the identity of the reference and alternative bases. Indels were 
classified into 83 classes according to whether they were an insertion 
or a deletion, the identity of the inserted or deleted base, the length 
of the mononucleotide tract in which they occurred and the degree of 
homology with the surrounding sequence (Fig. 1a). 

Signatures were extracted using a hierarchical Dirichlet process 
(HDP)> (https://github.com/nicolaroberts/hdp). Code and the input 
mutations are provided at https://github.com/HLee-Six/colon_micro- 
biopsies. First, the algorithm was conditioned on the set of mutational 
signatures that have found to be operative in colorectal cancers in 
PCAWG?: SBS1, SBS2, SBS3, SBS5, SBS13, SBS16, SBS17a, SBS17b, SBS18, 
SBS25 (included although it is not found in colorectal cancer because 
the similarity with the mutational profile with crypts from one individual 
had been previously noted), SBS28, SBS30, SBS37, SBS40, SBS41, SBS43, 
SBS45, SBS49, DBS, DBS3, DBS4, DBS6, DBS7, DBS8, DBS9, DBS10, DBS11, 
ID1,1D2,1D3,1D4, IDS, ID6, ID7, ID8, ID10 and ID14. This allows simultane- 
ous discovery of new signatures and matching to known ones. Nine SBS, 
two DBS and five ID signatures were discovered (Extended Data Fig. 2). 
Despite pre-conditioning, signatures that were perfectly correlated in 
all samples were still amalgamated. This occurred, for example, with 
SBS1, SBS5 and SBS18. Therefore, expectation maximization was used 
to deconvolute all HDP signatures into known PCAWG signatures. Ifa 
signature that was reconstituted from the components that expecta- 
tion maximization extracted (only including PCAWG signatures that 
accounted for at least 10% of mutations in each sample to avoid overfit- 
ting) had acosine similarity to the HDP signature of more than 0.95, the 
signature was presented as its expectation maximization deconvolu- 
tion. Three HDP signatures met these criteria: the HDP SBS1 signature 
was deconvoluted into a mixture of PCAWG SBS1, PCAWG SBSS and 
PCAWG SBS18; the HDP DBSA signature was deconvoluted into PCAWG 
DBS2, PCAWG DBS4, PCAWG DBS6, PCAWG DBS9 and PCAWG DBS11; 
and the HDP IDC signature was deconvoluted into PCAWGID1, PCAWG 
ID2 and PCAWG IDS (Extended Data Fig. 3). To test the robustness of 
this signature analysis, other signature-extraction methods were used: 
HDP with no pre-conditioning, the non-negative matrix factorization 
(NMF) method used ina previous study* anda version of the NMF algo- 
rithm used by in another study”. These all produced comparable results 
(Extended Data Fig. 4). 


Timing SBSA and SBSB throughout life 

Five patients had informative clades with branch points that allowed us 
to time SBSA. Plotting the cumulative amount of SBSA versus SBS1 at 
each node in these clades (Extended Data Fig. 6aq), we observed that 
for each, the rate of accumulation of SBSA relative to SBS1 was high in 
early branch points and then slowed down almost to zero onall branches 
but for one (a branch of patient ao; for this patient, SBSA continued to 
be acquired, albeit at a slow rate). 


We can take the inflexion point on the graph of cumulative SBSA versus 
SBS1 to be the upper limit of the point in time when SBSA slowed down. 
This provides an upper bound for three reasons. First, when we observe 
the presence of a signature on a branch, we know that the causative 
process must have been active at some point during the lifetime of 
the branch, but we cannot say when on the branch it occurs; it might 
have ended long before the branch did. Second, if the time to the most 
recent common ancestor of the crypt is longer than O, the age at which 
this stopped would be earlier. Third, if the SBS1 mutation rate is increased 
in early life—as it may be during the rapid growth of the embryo—the age 
at which the inflexion point occurs would be earlier. 

Using these five informative clades, and assuming a clock-like but 
personalized rate of SBS1 accumulation (i.e., each patient can accumulate 
SBS1 at their own constant rate), we found that the upper bound of the 
age at which SBSA slowed was: 9.7 years (patient h); 7.1 years (patient 
Z); 2.4 years (patient am); 20.1 years (patient aa); and 9.6 years (patient 
ao). There are no branches that begin after 10 years of age with a high 
ratio of SBSA to SBS1. 

The most informative branch point is the earliest inflexion point; 
the estimate of 2.4 years from patient aa is therefore, perhaps, our best 
estimate. Nonetheless, we did not want to base our statement ona single 
patient, andso10 years was given in the text as 4 patients had branches 
that ended before 10 years of age. 

Asimilar argument can be made for SBSB (Extended Data Fig. 6aq). 
For SBSB, however, only two clades were informative. The estimated 
upper bounds of age for SBSB activity were 2.4 years old (this was the 
same inflexion point as for SBSA in patient aa) and 6.4 years old (patient 
ai). For patient ai, areasonable amount of SBSBis still acquired after this 
branch point. If the ratio of accumulation of SBSB versus SBS1 contin- 
ued at the same rate as before this branch point, the number of SBSB 
mutations seen in the terminal branches would have been observed 
by 8.2 years of age. 


Analysis of telomere length 

Telomere length was estimated from whole-genome-sequencing data 
using Telomerecat. Telomerecat is a Python-based software package for 
estimating telomere length from short-read whole-genome-sequencing 
data®®. It functions by classifying paired-end reads as either fully or par- 
tially telomeric on the basis of the canonical hexamer TTAGGG, and uses 
that ratio to estimate an average telomere length. Notably, Telomerecat 
measures telomeres and also accounts for interstitial telomeric repeats. 
It is ploidy and species agnostic (assuming that the telomere hexamer 
is the canonical mammalian signature of TTAGGG,). Telomerecat has 
four main stages: 1) identification of all telomeric or partially telom- 
eric read pairs and creation of a subsetted bam file that contains only 
these reads; 2) classification of telomeric read-pairs as intratelomeric, 
boundary, junction-spanning or intrachromosomal; 3) error correc- 
tion of boundary or junction-spanning read pairs; and 4) estimation of 
telomere length based on the ratio of intratelomeric and boundary or 
junction-spanning read pairs. 

Telomerecat has been validated on whole-genome DNA-sequencing 
files from both tumour and normal samples™. Its results show concord- 
ance with an established method of telomere length measurement, the 
mean telomere restriction fragment (mTRF) technique. Alternative pack- 
ages are available, notably Computel”, Telseq® and TelomereHunter™. 
They all have respective strengths and have been benchmarked through 
their methods publications. We have opted for Telomerecat as it provides 
an estimate of telomere length at base-pair resolution, while also cor- 
recting for variations in sequencing depth ina ploidy-agnostic manner. 

We ran Telomerecat on 445 crypt bam files with good coverage and 
clonality to generate estimates of telomere length. Telomerecat was 
threaded across 10 cores and 100 simulation cycles were requested 
per run. The values displayed are the median telomere length across 
all chromosomes in that sample, measured in base pairs. 
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Statistical analyses 

All statistical analyses were performed in R (Supplementary Informa- 
tion). Code can be found at https://github.com/HLee-Six/colon_micro- 
biopsies. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Whole-genome and targeted sequencing data are deposited in the 
European Genome-phenome Archive (EGA) with accession codes 
EGAD00001004192 and EGADO0001004193. Images of microdis- 
sections and the physical distances between crypts are available on 
Mendeley Data (https://data.mendeley.com/datasets/zv6xrjxftw/1) 
by searching for the title of this article. All other data are available from 
the authors on request. 


Code availability 


Code for statistical analyses is provided as part of the Supplementary 
Information. Custom R scripts and their input data for signature analysis 
are available on GitHub at https://github.com/HLee-Six/colon_microbi- 
opsies. All other code is available from the authors on request. 
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Extended Data Fig. 1| Laser-capture microdissection of crypts. 

a, Representative image of a section of colonic tissue. The magnified inset shows 
the section before and after dissection of a crypt. b,c, Coverage of crypts that 
underwent whole-genome (b) and targeted (c) sequencing. d, e, VAFs (that is, 
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Extended Data Fig. 2| Results of HDP-based extraction of signatures. Results 
of signature extraction using an HDP with pre-conditioning on signatures that 
are known to be active in colorectal cancer. For each signature, the extracted 
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strongly are shown. Signatures are presented as in Fig. 2. The extraction of 
signatures using an HDP was followed by deconvolution by expectation 
maximization (Methods, Extended Data Fig. 3) to produce the versions of 
signatures that are presented in the main text. 
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Extended Data Fig. 3 | Decomposition of HDP signatures by expectation 
maximization. Three signatures were decomposed (SBS1, DBSA and IDC). For 
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Extended Data Fig. 4 | Validation of SBS signatures. a-c, Other methods 
of signature extraction were run to test the robustness of signature 
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without pre-conditioning on PCAWG. c, NMF implemented by the 
MutationalPatterns package in R (Methods). 
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Extended Data Fig. 5 | Linear modelling of the accumulation of signatures. 
For signatures that appeared to showa linear accumulation with age, the 
mutation rate per site was determined using mixed models, in which age and site 
were used as fixed effects and individual as a random effect. Confidence 
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intervals were determined by bootstrapping. n=445 crypts from 42 individuals. 
Solid lines represent the mean slope of the regression and shaded areas its 95% 


confidence intervals (CI95). 
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Extended Data Fig. 6 | Crypt phylogenies. a—ap, For each individual, the 
phylogeny of crypts is shown three times: at the top, with branch lengths 
proportional to the number of SBSs; in the middle, with branch lengths 
proportional to the number of DBSs; and on the bottom, with branch lengths 
proportional to the number of small indels. Scale bars are shown on the right. 
Astacked bar plot of the mutational signatures that contribute to each branchis 
overlaid over every branch. ‘XO’ indicates mutations that could not confidently 
be assigned to any signature. Note that the ordering of signatures along a given 
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branch is just for visualization purposes; we cannot distinguish the timing of 
different signatures along a branch. aq, The cumulative burden of SBSA (top) 
and SBSB (bottom) is plotted relative to the cumulative burden of SBS1 to time 
these mutational processes throughout life. Informative clades are shown (from 
patients labelled as in the rest of the figure), with every node and tip of the clade 
plotted inthe space of the cumulative number of mutations that are due toa 
given signature that have occurred up until that node in the tree. Lines represent 
the branching structure of the tree. 
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Extended Data Fig. 7 | Copy-number changes and structural variants in 
normal colon. a-—d, A total of 449 crypts had sufficient coverage to be 
evaluated. a, Whole-chromosome amplifications in five crypts. The copy- 
number state (y axis) for each chromosomeis shown, with one allele coloured 
red and the other green. Chromosomes are labelled along the top of the graph. 
b, Timing of copy-number changes throughout life. Vertical bars represent 95% 
confidence intervals, which were determined by bootstrapping. Horizontal bars 
represent the most likely time of the copy-number change, as defined by 
mutationTimeR (see Supplementary Information). c, Crypts with loss of 
heterozygosity (LOH). For each chromosome witha LOH event, the copy number 
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across the whole chromosome is shownat the top, with the total copy number in 
black and the copy number for the minor allele in blue. The images at the bottom 
show example SNPs that support the LOH. In each case, reads from the cryptin 
question are shown above, and reads from its matched normal below. Thus, in 
the first image, the wild-type state (below) is heterozygous for a T SNP (red), 
whereas in the crypt in question (above), this polymorphism has now become 
homozygous. Small deviations froma fully homozygous state are probably a 
result of stromal contamination. d, Reads supporting structural variants in 
normal colon. Patients are labelled as in Extended Data Fig. 6. 
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Extended Data Fig. 8 | Gain-of-function driver mutations in normal colon. 


Putative driver missense mutations in oncogene hotspots. The number of 
substitutions catalogued in COSMIC*® is shown on the yaxis at each position 
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along the gene, with the mutations that were observed in our cohort indicated 
with arrows. 
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Extended Data Fig. 9 | Occurrence matrix of signatures and driver mutations 
incrypts. For all crypts that were whole-genome sequenced to sufficient depth, 
and for crypts that underwent targeted sequencing and in which driver 
mutations were found, the signatures and driver mutations are shown. Each 
vertical column represents a crypt. The individual to whom each crypt belongs 
is indicated by the alternating colours in the top bar (labelling as in Extended 
Data Fig. 6). The site to which each crypt belongs is shown underneath. The 
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matrix is coloured by the contribution of each signature to each crypt, 
normalized for each signature: the crypt with the largest contribution of a given 
signature is purple and the crypt with the smallest contribution is white. Crypts 
in which the signatures could not be assessed, either because they underwent 
targeted sequencing or because the coverage was poor, are grey. Driver 
mutations, including heterozygous mutations in tumour suppressor genes, are 
indicated bya black bar. 
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Extended Data Fig. 10 | Stem cell dynamics of normal colon. a, Number of 
stem cells and replacement rate of stem cells in normal human colonic crypts, as 
estimated by approximate Bayesian computation. Each point represents a 
simulation. Points are coloured according to their similarity to the observed 
data: the most similar 0.1% are coloured dark red, and so on, until the least similar 
simulations are blue. b, Approximate Bayesian computation of the rate of crypt 
fission (fissions per crypt per year) inthe human colon. The prior distribution of 
the crypt fission rate (which was used to simulate many biopsies of the colon) is 
shown above, and the posterior distribution of the crypt fission rate (estimated 
by neural network regression on the simulations) is shown below. c,d, Evidence 
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of crypt fusion in human colon. In each case, a phylogeny is shown at the top that 
depicts the genetic relationships between selected crypts. Dashed blue lines 
show mutations with a lowallele fraction that are shared betweencryptsina 
manner incompatible with the phylogeny dictated by the clonal mutations. 
Below each crypt inthe phylogeny is an image that depicts its position in the 
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attributions and mutational burden for colorectal adenocarcinoma are froma 
previous study’. A total of 60 cancers are compared with 472 normal crypts. 
b, The proportion of driver mutations in each gene in normal colon (left) and 
colorectal cancer (right). The frequency of driver mutations in cancer was 
derived using data from TCGA research network* (Supplementary Methods). 


Extended Data Fig. 11| Comparison of the mutational signatures and driver 
landscape of normal crypts and colorectal adenocarcinomas. a, Comparison 
of the burden of mutations for every mutational signature. For each signature, 
the yaxis shows the mutational burden +1 of every sample ona logarithmic scale. 
Normal colon and cancer samples are ordered within their groups. The signature 
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Sample size Sample size was chosen based on the resources available. As this was a description of the landscape, rather than an attempt to test a 
particular hypothesis, a power calculation is not applicable. Our sample size was an order of magnitude larger than previous studies, which we 
hoped would be sufficient to yield new insights. 


Data exclusions Some crypts were excluded if their sequencing coverage and/or clonality were too poor for mutations to be called accurately. 
10X coverage was required for calling copy number changes. 
For statistical analyses in supplementary results 2, only crypts whose median depth multiplied by their median variant allele fraction was 
greater than 3 were included. 
These specific criteria were not pre-established, although it had always been expected than some coverage and clonality cut-offs would be 
required to obtain good quality data. 


Replication Our article describes the genomic landscape of the normal colon, it does not test specific hypotheses, and so replication does not apply in its 
usual way. 


Sequencing replicates are not normally possible as once a crypt has been sequenced it has been used up. 


Randomization Our article describes the genomic landscape of the normal colon, it does not test a treatment, and so randomisation does not apply. 
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Population characteristics We obtained healthy colonic biopsies from four cohorts (Supplementary Table 1). The first represents seven deceased organ 
donors ranging in age from 36 to 67, from whom colonic and small intestinal biopsies were taken at the time of organ donation 
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(REC 15/EE/0152). The second represents individuals aged 60 to 72 who were having a colonoscopy following a positive faecal 
occult blood test as part of the Bowel Cancer Screening Programme (Ethical approval 08-HO308-13); we selected 16 who were 
not found to have either an adenoma or a carcinoma on colonoscopy, and 15 who were found to have a colorectal carcinoma 
(the normal biopsies that we use were distant from these lesions). The third cohort represents three paediatric patients who 
underwent routine colonoscopy to exclude inflammatory bowel disease and who were found to have a completely normal 
intestinal mucosa macroscopically and histologically (REC 12/EE/0482). The final cohort included one 78 year-old gentleman with 
oesophageal cancer who underwent a warm autopsy (REC 13/EE/0043). This gentleman had been treated with palliative 
chemotherapy of Epirubicin, Oxaliplatin and Capecitabine within the three months before the autopsy; given the slow 
monoclonal conversion within crypts, and mutations due to these chemotherapies are unlikely to be detected. All samples were 
obtained with informed consent and studies approved by East of England Research Ethics Committees. 


Recruitment The transplant donor cohort recruits those who are on the organ donor register and are suitable for transplantation. Donors 
must therefore have healthy organs. This should not affect the results of a survey or normal colonic epithelium. 


In the Bowel Cancer Screening programme, men and women aged 60-72 were offered a faecal occult blood test every 2 years. It 
is possible that there may be higher uptake by those who are more concerned about their health and therefore slightly healthier 
than the general population. Those who had a positive FOB are more likely to have colonic pathology, but all were found to have 
no cancer near the lesion at colonoscopy. On the whole, the recruitment of this cohort should not affect our results. 
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One warm autopsy patient with oesophageal cancer consented for his body to be used for research as part of the Phoenix study. 
Without this sort of recruitment it would be very difficult to capture the extremes of age in our study. 


Children who had gastrointestinal symptoms suggestive of inflammatory bowel disease but a normal colonoscopy were included. 
While, as they had symptoms, they may be slightly less likely to have normal colons than the rest of the population, the 
colonoscopy excludes anything obvious. The most likely cause for their symptoms is infective, which is unlikely to affect the 
genomic landscape of the colon. 
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The most common causes of chronic liver disease are excess alcohol intake, viral 
hepatitis and non-alcoholic fatty liver disease, with the clinical spectrum ranging in 
severity from hepatic inflammation to cirrhosis, liver failure or hepatocellular 
carcinoma (HCC). The genome of HCC exhibits diverse mutational signatures, 
resulting in recurrent mutations across more than 30 cancer genes” ’. Stem cells from 
normal livers have alow mutational burden and limited diversity of signatures®, which 
suggests that the complexity of HCC arises during the progression to chronic liver 
disease and subsequent malignant transformation. Here, by sequencing whole 
genomes of 482 microdissections of 1OO-500 hepatocytes from 5 normal and 9 
cirrhotic livers, we show that cirrhotic liver has a higher mutational burden than 
normal liver. Although rare in normal hepatocytes, structural variants, including 
chromothripsis, were prominent in cirrhosis. Driver mutations, such as point 
mutations and structural variants, affected 1—-5% of clones. Clonal expansions of 
millimetres in diameter occurred in cirrhosis, with clones sequestered by the bands of 
fibrosis that surround regenerative nodules. Some mutational signatures were 
universal and equally active in both non-malignant hepatocytes and HCCs; some were 
substantially more active in HCCs than chronic liver disease; and others—arising from 
exogenous exposures—were present in a subset of patients. The activity of exogenous 
signatures between adjacent cirrhotic nodules varied by up to tenfold within each 
patient, as a result of clone-specific and microenvironmental forces. Synchronous 
HCCs exhibited the same mutational signatures as background cirrhotic liver, but with 
higher burden. Somatic mutations chronicle the exposures, toxicity, regeneration and 


clonal structure of liver tissue as it progresses from health to disease. 


Identifying somatic mutations in non-malignant tissue requires 
approaches that overcome the polyclonality of this tissue, such as single- 
cell sequencing”, cultures of single cells*”° or microbiopsy sequencing”. 
The latter relies on local cell division with limited migration leading 
to a clonal patchwork, which has been observed in liver tissue”. We 
generated whole-genome sequences from 482 laser-capture micro- 
dissections (LCMs) of 100-500 hepatocytes (Extended Data Fig. 1a) 
across 14 individuals: 5 healthy controls; 4 patients with cirrhosis from 
alcohol-related liver disease (ARLD) and 5 patients with cirrhosis from 
non-alcoholic fatty liver disease (NAFLD) (Supplementary Tables 1, 2, 
Extended Data Figs. 4-6). Samples of normal liver were acquired from 
hepatic resections of colorectal cancer metastases, and samples of cir- 
rhotic liver were taken from patients who underwent liver transplants 
for synchronous but distant HCC. 

To evaluate sensitivity and specificity, we generated independent 
libraries and sequencing data from different sections of the same biopsy, 


microdissecting the samex, y-region from adjacent z-stacks separated 
by around 20 um. Concordance was high between variants that were 
called in adjacent sections, but not between distant pairs, suggesting 
that the specificity of mutation calls was high (Extended Data Fig. 1b). 
Sensitivity across patients ranged from 50 to 95%, depending on cov- 
erage and clonality (Extended Data Fig. Ic-f). As a further check on 
specificity, targeted deep sequencing of cancer genes from the same 
library as 96 whole-genome samples confirmed 16 of the 17 mutations 
that were originally called. In keeping with polyploidy as a late stage of 
differentiation in the liver”, 20-25% of mature hepatocytes in microdis- 
sected samples were multinuclear (Extended Data Fig. 1g). We therefore 
deployed copy-number algorithms with an expected ploidy of 4, and 
report mutational burdens per diploid genome, rather than per cell. 
We observed considerable heterogeneity in the burden of somatic 
substitutions both between and within patients (Fig. 1a, Supplemen- 
tary Tables 3, 4). Using mixed-effects models, microdissections from 
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Fig. 1| Mutational burden observed in non-cancerous hepatocytes. 

a, Burden of single-nucleotide variants (SNVs), corrected by sensitivity of 
mutation detection. Each box plot represents a patient (n=14 patients; 482 
microdissections) and each dot represents one laser-capture microdissected 
sample. The grey-to-black intensity of the points reflects the median variant 
allele fraction (VAF) of mutations in each microdissection. Boxes in the box plots 
indicate median and interquartile range; whiskers denote range. b, Burden of 
indel variants (n= 14 patients; 482 microdissections).c, Burden of copy-number 
variants (CNVs) and structural variants (SVs), represented as the number of 
unique events per patient. d, Chromothripsis involving chromosomes 16 and 21, 
observed in patient PD37111. Black points represent corrected read depth along 
the chromosome. Lines and arcs represent structural variants, coloured by the 
orientation of the joined ends (purple, tail-to-tail inverted; brown, head-to-head 
inverted; turquoise, tandem-duplication-type orientation; green, deletion-type 
orientation). e, Chromothripsis involving chromosomes 1and 3, observed in 
patient PD37105. f, Chromothripsis involving chromosomes 2, 5 and 6, observed 
in patient PD37105 (ina separate clone toe). 


cirrhotic livers had, on average, 1,251 (95% confidence interval, 233- 
2,268; P= 0.02) extra substitutions per diploid genome compared to 
normal livers, independent of age. In accordance with published values®, 
the estimated rate of accumulation of mutations was 33 per year per 
diploid genome, albeit with wide confidence intervals (95% confidence 
interval, -17 to 84; P=0.18) and moderate variation between individuals 
(estimated between-individual s.d., 13 per year). Insertions and deletions 


(indels) showed the same heterogeneity between and within individuals 
as substitutions (Fig. 1b). 

Structural variants and copy-number alterations occurred in mod- 
erate numbers across all nine patients with liver cirrhosis, despite 
being rare in normal liver (Fig. 1c, Extended Data Fig. 2, Supplemen- 
tary Tables 3, 4). Occasional aneuploidy at whole-chromosome or arm 
level occurred, as well as focal events including deletions, tandem 
duplications and unbalanced translocations (Extended Data Fig. 2). We 
found five separate clusters of structural variants across three patients, 
with patterns indicative of chromothripsis™ (Fig. 1d-f, Extended Data 
Fig. 2). Chromothripsis—in which multiple rearrangements occur ina 
single catastrophic mitosis“—is a major process of mutationin cancers 
(occurring in around 5% of HCCs*), but is rare in normal somatic cells. 
Our observation of 1-2% of clones with chromothripsis in chronic liver 
disease suggests that sustained toxicity and regeneration substantially 
increases mitotic stress in hepatocytes. 

We screened for driver mutations among coding regions, 5’-untrans- 
lated regions (UTRs),3’-UTRsand promoters (Supplementary Tables 5-8). 
Noelements were significant genome-wide after correcting for multiple 
hypotheses, so we focused on the 30 most-prevalent HCC genes’ ®. These 
carried 22 non-synonymous variants that were seen in both normal and 
cirrhotic samples and included inactivating mutations in the tumour 
suppressor genes ACVR2A, ARID2, ARIDIA and TSC2 (Extended Data 
Fig. 3a). When hypothesis testing was restricted to these 30 genes, ALB 
and ACVR2A were significant (gq = 0.001 and g = 0.001, respectively). 
Recurrence in ALB (which encodes the protein albumin) probably 
reflects a mutational process in which indels preferentially occur in 
highly expressed genes, as reported in HCCs>"* (Extended Data Fig. 3b, c). 
Assuming no negative selection, we can use the ratio of non-synonymous 
to synonymous substitutions for the 30 HCC genes to estimate the num- 
ber of driver substitutions among them”; this gives a 95% confidence 
interval of 0.0-13.2 driver mutations in total across 482 microdissections 
(that is, less than 3%). Among copy-number aberrations of potential 
importance’*"* (Supplementary Table 9), we found instances of loss of 
chromosomes 22 and 8p, and gain of chromosome 8q. Two focal dele- 
tions in different patients spanned ACVR2A (Extended Data Fig. 2c, e). 
We also found a reciprocal inversion that deleted CDKN2A (Extended 
Data Fig. 2f), the most common focal deletion in HCC, and a deletion 
that affected ARIDSA. 

We reconstructed phylogenetic trees” and layered them onto the 
histology of the specimens. Samples from healthy controls showed the 
highly polyclonal nature of normal liver, with little genetic relatedness 
among even closely located microdissections (Fig. 2a-d, Extended 
Data Fig. 4). Samples from patients with chronic liver disease showed 
a clonal structure that was more complex, from which three general 
inferences can be drawn (Fig. 2e-p, Extended Data Figs. 5, 6). First, we 
found no sharing of mutations between adjacent liver nodules separated 
by fibrotic bands. This suggests that the connective tissue that is laid 
down during cycles of damage and regeneration sequesters clones from 
the early stages of the disease process. Second, some cirrhotic nodules 
were monoclonally derived (Fig. 2j, n, for example), whereas others 
were oligoclonal (Fig. 2f), and shared mutations often extended across 
microdissections that were millimetres apart. Third, branching struc- 
tures in phylogenies point to subclonal diversification within nodules. 
Within such aclone, the proportion of shared, clonal mutations onthe 
trunk relative to those on the subclonal branches gives an estimate in 
molecular time of when the most-recent common ancestor of the clone 
emerged. In some patients (for example, patient PD37114; Fig. 2i, j), 
the common ancestor of individual nodules emerged relatively early 
in molecular time, whereas in others (for example, patient PD37116; 
Fig. 2m, n), the common ancestor appeared much more recently. As 
the majority of liver cells do not have driver mutations, the size and 
rapidity of clonal expansions observed here demonstrate the consid- 
erable intrinsic capacity of hepatocytes to regenerate in response to 
liver damage. 
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Fig. 2| Phylogenetic reconstruction of hepatocyte clones. a, Phylogenetic 
tree constructed from clustering of mutations across microdissected samples 
ina healthy individual (PD36715). Lengths of branches (x axis) indicate the 
numbers of mutations assigned to that branch. Solid lines indicate that nesting 
isin accordance with the pigeonhole principle; dashed lines indicate that 
nesting is in accordance with the pigeonhole principle, assuming that 
hepatocytes represent 70% of cells; dotted lines indicate that nesting is only 
based on clustering (clones are assigned as nested if the VAFs of constituent 
microdissections are lower than those in the parental clone). b, Representation 
of branches from the phylogenetic tree in a according to their physical 
coordinates, overlaid onto a haematoxylin and eosin (H&E)-stained section. 
Black points represent branches of the tree that share no mutations with any 
other samples; coloured points represent branches with shared clonal 
relationships (n= 26 microdissections). c, d, Asecond healthy individual 
(PD36713; n=30 microdissections). e, f, Patient with ARLD (PD37105;n=31 
microdissections). g,h, Patient with ARLD (PD37110; n=22 microdissections). 
i,j, Patient with NAFLD (PD37114; n= 41 microdissections). k,l, Patient with 
NAFLD (PD37115; n= 34 microdissections) m,n, Patient with NAFLD (PD37116; 43 
microdissections). 0, p, Patient with NAFLD (PD37118; 26 microdissections). 


A major debate in the modelling of cancer development is whether 
cancers need higher rates of mutation to acquire sufficient driver muta- 
tions. We compared the mutational burden in cirrhotic liver to synchro- 
nous, clonally unrelated HCCs from seven patients. Synchronous HCCs 
carried, onaverage, 4,600 more mutations than matched cirrhotic liver 
(95% confidence interval, 3,600-5,500; P<10 8 using linear mixed-effect 
models; Fig. 3a). This indicates that rates of mutation increase during 
malignant transformation, either through cancer-specific mutational 
processes or through greater activity in cancers of ubiquitous muta- 
tional processes. 

To assess which mutational processes are active in cirrhosis, 
we extracted mutational signatures from our 482 microdissections, 
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Fig. 3 | Mutational signatures in normal liver, cirrhotic liver and HCC. 

a, Number of somatic substitutions (SNVs; sensitivity-corrected for non- 
cancerous samples) and indels in each non-cancer microdissection sample (blue 
circles) and associated synchronous HCCs (red diamonds). b-e, Estimated 
proportional contributions of each mutational signature to each 
phylogenetically defined cluster of somatic substitutions. Data were generated 
using a Bayesian hierarchical Dirichlet process. Unattrib., unattributed. Stacked 
bar plots show proportional contributions of signatures in healthy individuals (b), 
patients with ARLD (c), patients with NAFLD (d) and 54 cases of HCC from TCGA! 
(e). f, Number of SNVs attributed to prevalent mutational signatures in each non- 
cancer microdissection sample (blue circles) and synchronous HCCs (red 
diamonds). Contributions for the TCGA samples are shown onthe right. The 
yaxisis onalogarithmic scale. 


as well as from the 7 synchronous HCCs and 54 HCC genomes from 
The Cancer Genome Atlas (TCGA)', using two independent algorithms 
(Fig. 3b-e, Extended Data Figs. 7, 8). Three major groups of mutational 
signatures emerged: first, those that are ubiquitous and similarly active 
across cirrhosis and HCC; second, those that are minor contributorsin 
cirrhosis but universally more active in HCC; and third, those that are 
active in some patients but absent in others, including signatures that 
arise from exogenous stimuli. 

Innormal and cirrhotic liver, ubiquitous mutational signatures (5 and 
A) were prevalentacross clonesand, incombination, typically accounted 
for more than 75% of mutations. Signature 5 is widespread across 
cancers—including HCCs?*”°—and accumulates linearly with age, 
suggesting that it arises from endogenous mutational processes. 
Signature A is the dominant cause of mutations in normal blood stem 
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cells’©” and leukaemias”, which indicates that it too arises endog- 


enously. In HCCs, although signature A accounted for a lower pro- 
portion of mutations than in normal or cirrhotic liver, the absolute 
numbers of mutations attributed to signature A were comparable 
(difference between cancer and non-cancer, 60 mutations; 95% con- 
fidence interval, -80 to 200; P= 0.4; Fig. 3f, Supplementary Table 10). 
This suggests that signature A is active in hepatocytes throughout 
life, but is outstripped in HCC by mutational processes that emerge 
during malignant transformation. 

A second group of mutational signatures comprises processes 
that are relatively quiet in cirrhotic liver but universally more active 
in HCC (signatures 1, 12, 16, 40 and anewsignature, D; Supplementary 
Table 10). One of these, signature 16, consists of T to C mutations in 
the ApT context and has a known transcriptional-strand bias, which 
includes both the preferential repair of damaged adenines on tran- 
scribed strands and increased damage on non-transcribed strands”. 


Although this signature is more active in HCCs, we do see its charac- 
teristic transcriptional-strand bias in cirrhotic liver (Extended Data 
Fig. 9a). Signature 1, which is caused by spontaneous deamination 
of methylated cytosine to thymine, is also much more active in HCC 
than non-malignant liver. The acceleration and universality of these 
signatures in HCC suggests that they reflect inbuilt DNA damage and 
repair processes in hepatocytes that are unmasked during malignant 
transformation. 

The third group of mutational processes represents signatures that 
are seen sporadically across the cohort and that are frequently caused 
by exogenous factors. One, signature 4, is found in lung cancers from 
smokers”, and also in HCCs, albeit with a less clear-cut relationship 
to tobacco’. Of our 14 patients, 4 had more than 10% of microdissec- 
tions in which more than 5% of mutations were attributed to signature 
4, demonstrating the expected transcriptional-strand bias of this sig- 
nature on guanines (Extended Data Fig. 9b). Not only did signature 
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4 show considerable patient-to-patient heterogeneity, but there was 
also unexpectedly high clone-to-clone and nodule-to-nodule variabil- 
ity within individual livers. In one patient, for example, about half the 
clones we sequenced had 2,000-4,000 mutations, whereas the other 
half had 8,000-12,000; these differences were driven by the presence 
or absence of signature 4 (PD37111; Fig. 4a). 

This within-patient regional variability extended to other exogenous 
factors. In one patient (PD37107), 20-35% of mutations were derived 
from signature 22 (Fig. 4b, Extended Data Fig. 9c), whichis characteristic 
of exposure to aristolochic acid”. This patient grew up in Poland and 
spent time on holiday in Balkan states where exposure to aristolochic 
acid is common”. Ina different patient (PD36714), a subset of microdis- 
sections had 10-20% of mutations that were attributable to signature 24 
(Fig. 4c), which is associated with aflatoxin-B, exposure’. Aflatoxin-B, is 
produced by Aspergillus moulds that contaminate crops, and biomarkers 
of exposure to this toxin are prevalent in arable farmers*—the occupa- 
tion of our patient. In both patients, these carcinogens showed notable 
variability in mutational activity over short distances, generating few 
mutations in some clones and hundreds to thousands in others. This 
regional variation in the activity of exogenous signatures is unexpected, 
and so far unexplained. 

In one patient, we found a large clone that carried more than 2,000 
mutations attributable to signature 9 (Fig. 4d)—a result of off-target 
somatic hypermutation in B lymphocytes”. A clonotypic rearrange- 
ment of /GH was evident, which is consistent with the notion that a 
single B lymphocyte subclonally diversified as it expanded in the liver 
(Extended Data Fig. 10). Signature 9 was only present on the ancestral 
trunk, whereas signatures in the subclones (acquired in the liver) were 
distributed in a similar manner to hepatocytes, suggesting that the 
hepatic microenvironment shaped the ongoing mutational processes 
inthe lymphocytes. 

In conclusion, then, non-malignant liver has considerably lower 
proportions of clones (less than 5%) with driver point mutations or 
structural variants than oesophagus or skin"°”’, and those present 
were seen in both normal and cirrhotic liver. In the cirrhotic liver, 
fibrosis isolated these clones, either with or without driver muta- 
tions, restricting their expansion. Moreover, driver mutations were 
not shared with distant synchronous HCCs, which suggests that the 
increased risk of cancer in chronic liver disease arises from a myriad 
of clones that compete independently to acquire sufficient driver 
mutations. Mutations inthe TERT promoter are likely to be key events 
in the progression to HCC; we did not identify any TERT promoter 
mutations in cirrhotic or normal liver, but they are seen in dysplastic 
hepatic nodules'*”’, The low proportion of clones with driver mutations 
that we observed here, and that has also been shown in exome studies 
performed elsewhere”’”°, means that much larger sample sizes will be 
needed to comprehensively map how driver mutations accumulate in 
the progression from normal liver through regenerative and dysplastic 
nodules to HCC. 

These data reveal the genomic consequences of chronic liver disease— 
increased rates of mutation, complex structural variation (including 
chromothripsis and aneuploidies) and a low burden of mutations that 
target known HCC genes. Genomically, one middle-aged, healthy liver 
looks much like any other: a community of small, tightly packed clones, 
each comprising a few hundred cells and containing around 1,000-1,500 
mutations that come from a limited palette of signatures. Unhealthy 
livers diverge from this norm and instead exhibit large dynasties of 
clones, which are sequestered by bands of fibrosis and havea repertoire 
of signatures that is more variable, more vigorous and more regionally 
variegated. 


542 | Nature | Vol574 | 24 OCTOBER 2019 


Online content 


Any methods, additional references, Nature Research reporting summa- 
ries, source data, extended data, supplementary information, acknowl- 
edgements, peer review information; details of author contributions 
and competing interests; and statements of data and code availability 
are available at https://doi.org/10.1038/s41586-019-1670-9. 


1. Cancer Genome Atlas Research Network. Comprehensive and integrative genomic 
characterization of hepatocellular carcinoma. Cell 169, 1327-1341 (2017). 
2. Schulze, K. et al. Exome sequencing of hepatocellular carcinomas identifies new 
mutational signatures and potential therapeutic targets. Nat. Genet. 47, 505-511 (2015). 
3. Totoki, Y. et al. Trans-ancestry mutational landscape of hepatocellular carcinoma 
genomes. Nat. Genet. 46, 1267-1273 (2014). 
4. Fujimoto, A. et al. Whole-genome sequencing of liver cancers identifies etiological 
influences on mutation patterns and recurrent mutations in chromatin regulators. Nat. 
Genet. 44, 760-764 (2012). 
5.  Letouzé, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and 
cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017). 
6. Kan, Z. et al. Whole-genome sequencing identifies recurrent mutations in hepatocellular 
carcinoma. Genome Res. 23, 1422-1433 (2013). 
7. Guichard, C. et al. Integrated analysis of somatic mutations and focal copy-number 
changes identifies key genes and pathways in hepatocellular carcinoma. Nat. Genet. 44, 
694-698 (2012). 
8. Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during 
life. Nature 538, 260-264 (2016). 
9.  Lodato, M. A. et al. Aging and neurodegeneration are associated with increased 
mutations in single human neurons. Science 359, 555-559 (2018). 

0. Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic 
mutations. Nature 561, 473-478 (2018). 

1. Martincorena, I. et al. High burden and pervasive positive selection of somatic mutations 
in normal human skin. Science 348, 880-886 (2015). 

2. Fellous, T. G. et al. Locating the stem cell niche and tracing hepatocyte lineages in human 
liver. Hepatology 49, 1655-1663 (2009). 

3. Sigal, S. H. et al. Partial hepatectomy-induced polyploidy attenuates hepatocyte 
replication and activates cell aging events. Am. J. Physiol. 276, G1260-G1272 (1999). 

4. Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic 
event during cancer development. Cell 144, 27-40 (2011). 

5. Fernandez-Banet, J. et al. Decoding complex patterns of genomic rearrangement in 
hepatocellular carcinoma. Genomics 103, 189-203 (2014). 

6. Imielinski, M., Guo, G. & Meyerson, M. Insertions and deletions target lineage-defining 
genes in human cancers. Cell 168, 460-472 (2017). 

7. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 
171, 1029-1041 (2017). 

8. Torrecilla, S. et al. Trunk mutational events present minimal intra- and inter-tumoral 
heterogeneity in hepatocellular carcinoma. J. Hepatol. 67, 1222-1231 (2017). 

9. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994-1007 (2012). 

20. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 
415-421 (2013). 

21. Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related 
mutagenesis in human hematopoiesis. Cell Rep. 25, 2308-2316 (2018). 

22. Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal 
mechanisms of DNA damage and repair. Cell 164, 538-549 (2016). 

23. Poon, S.L. et al. Genome-wide mutational signatures of aristolochic acid and its 
application as a screening tool. Sci. Transl. Med. 5, 197ra101 (2013). 

24. Scelo, G. et al. Variation in genomic landscape of clear cell renal cell carcinoma across 
Europe. Nat. Commun. 5, 5135 (2014). 

25. Rushing, B. R. & Selim, M. I. Aflatoxin B1: a review on metabolism, toxicity, occurrence in 
food, occupational exposure, and detoxification methods. Food Chem. Toxicol. 124, 
81-100 (2019). 

26. Martincorena, |. et al. Somatic mutant clones colonize the human esophagus with age. 
Science 362, 911-917 (2018). 

27. Yokoyama, A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer 
drivers. Nature 565, 312-317 (2019). 

28. Nault, J.C. et al. Telomerase reverse transcriptase promoter mutation is an early somatic 
genetic alteration in the transformation of premalignant nodules in hepatocellular 
carcinoma on cirrhosis. Hepatology 60, 1983-1992 (2014). 

29. Kim, S. K. et al. Comprehensive analysis of genetic aberrations linked to tumorigenesis in 
regenerative nodules of liver cirrhosis. J. Gastroenterol. 54, 628-640 (2019). 

30. Zhu, M. et al. Somatic mutations increase hepatic clonal fitness and regeneration in 
chronic liver disease. Cell 177, 608-621 (2019). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


© The Author(s), under exclusive licence to Springer Nature Limited 2019 


Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and, unless otherwise stated, the 
investigators were not blinded to allocation during experiments and 
outcome assessment. 


Samples 

Patients recruited at Addenbrooke’s Hospital, Cambridge gave written 
informed consent with approval of the Local Research Ethics Commit- 
tee (16/NI/0196). 

Normal liver samples were obtained from patients with liver metas- 
tases from colorectal carcinoma. The liver specimens were obtained 
from resected liver distal to the metastases, and were confirmed to 
be free of tumour cells by histology. None of the patients had under- 
gone neoadjuvant systemic therapy; one patient had undergone pre- 
operative portal vein embolization (PD36718) to the ipsilateral liver 
lobe. Liver tissue from patients with chronic liver disease was derived 
from explanted diseased livers at the time of transplantation. All of 
the patients were identified as having ARLD or NAFLD by their clini- 
cal history with the transplant hepatology and addiction psychiatry 
teams, as well as by explanted liver histology. None of the patients had 
undergone transarterial chemoembolization or other locoregional 
therapy onthe transplant waiting list, except PD37118, who underwent 
asingle treatment to their HCC with transarterial chemoembolization. 
All of the patients with chronic liver disease, except one (PD37105), 
demonstrated substantial pre-operative impairment of liver function 
as evidenced by a UK model for end-stage liver disease (UKELD) score 
of higher than 50. 

The explant liver histology was reviewed by a specialist liver histo- 
pathologist (S.E.D.), blinded to the sequencing results. The normal liver 
specimens had no fibrosis and no evidence of chronic liver disease; the 
explanted diseased livers uniformly demonstrated cirrhosis and HCC. 
The background liver histology was scored according to the Kleiner sys- 
tem” on formalin-fixed paraffin-embedded (FFPE) samples away from 
the HCC and the fresh-frozen block used for the sequencing analysis. 
The Kleiner score assesses the presence of steatosis, lobular inflam- 
mation and hepatocyte ballooning to generate a cumulative NAFLD 
activity score (NAS). The presence or absence of cellular or nodular 
dysplasia was assessed globally in clinical FFPE samples (Supplemen- 
tary Table 1), as well as specifically in the fresh-frozen block used for 
the LCM and sequencing (Supplementary Table 1). Serial H&E-stained 
sections from the frozen block did not demonstrate dysplasia in any of 
the cases (Supplementary Table 1). There was no evidence of CRC or HCC 
on histological review of the fresh-frozen block used for sequencing. 

All tissue samples were snap-frozen in liquid nitrogen and stored at 
-80 °C inthe Human Research Tissue Bank of the Cambridge University 
Hospitals NHS Foundation Trust. 


Preparation of tissue sections 

Tissue biopsies were embedded in optimal cutting temperature (OCT) 
medium (Thermo Fisher Scientific) at -25 °C. Sections were cut at a 
thickness of 20 pm using a Leica cryotome and transferred onto poly- 
ethylene naphthalate (PEN) membrane slides (Thermo Fisher Scientific). 
For fixation, slides were treated with 70% ethanol at room temperature 
for 2 min. Slides were washed twice in 10% phosphate-buffered saline 
(PBS) at room temperature for 10s. For staining, slides were incubated 
in haematoxylin for 10s and rinsed twice in water. Slides were then incu- 
bated in eosin for 5s and rinsed once in water. Slides were washed twice 
with 70% ethanol for 5s, twice with 100% ethanol for 5 s and in xylene 
for 5s. Storage was at —20 °C. Additional sections were stained for H&E, 
Masson’s trichrome and Oil Red O by standard laboratory techniques. 
Allslides were scanned ona Leica AT2 at x20 magnification and aresolu- 
tion of 0.5 pm per pixel. 


Laser-capture microdissection 

Microdissection was performed using a laser-capture microscope (Leica 
Microsystems LMD 7000). For each biopsy, 48 microdissections were 
cut with a target size of 20,000 ppm’, which corresponds to about 400 
hepatocyte cells. Images were taken before and after LCM. 


Sample lysis and DNA preparation 

LCM biopsies were lysed using the Arcturus PicoPure DNA Extraction Kit 
(Thermo Fisher Scientific) following the manufacturer’s instructions. 
DNA libraries for Illumina sequencing were prepared using a protocol 
optimized for low input amounts of DNA, as described”. 


Whole-genome sequencing 

Paired-end sequencing reads (150 bp) were generated using the IIlu- 
mina X10 platform for 400 samples, resulting in a target coverage of 
30x-70x per sample. To avoid the known index-hopping artefact, we 
chose to avoid multiplexing samples and instead sequenced one sample 
per flow-cell lane. To increase coverage for a subset of 96 samples, we 
used multiplexing and achieved 70x coverage. In addition to the LCM 
samples, we also sequenced a bulk sample for each biopsy and (where 
available) associated HCC. 

The healthy liver samples came from wide resections of hepatic 
metastases of colorectal cancer. In each case, we sequenced the metas- 
tasis; this did not reveal any mutations that were shared between the 
colorectal cancer and liver, or any variants that were shared by all liver 
samples absent from the colorectal cancer (beyond regions of loss of 
heterozygosity in the cancer). Likewise, for the cirrhotic liver samples, 
we sequenced the matched HCC, which did not revealing any sharing of 
mutations. In one case, we sequenced microdissections of the fibrotic 
tissue, and here we also did not find mutations restricted to all liver cells. 

Sequencing data were mapped to the human genome, GRCh37d5, 
using the BWA-MEM algorithm. 


Calling of SNVs 

Substitution variants were called using the Cancer Variants through 
Expectation Maximization (CaVEMan) algorithm*, using the bulk sam- 
ple of the liver biopsy as the matched normal. As part of the algorithm, 
the variants were annotated using VAGrENT™. Variant calls for bulk 
sequencing data of the cancer samples were not further filtered. For 
sequencing of LCMs, post-filtering was performed in three steps. 

1) Removal of duplicate counts. We noticed instances in which variant 
bases were counted twice, owing to the overlap of paired-end sequenc- 
ing reads. We removed such double counting and re-evaluated variant 
calls after taking double counts into account. 

2) Removal of variants that were introduced during library prepara- 
tion. We noticed the presence of variants that were introduced owing to 
incorrect processing of cruciform DNA. Erroneous variants were often 
present in inverted repeats and frequently accompanied by another 
proximal (~1-30 bp distance). These inverted repeats can form cruci- 
form DNA before the isolation of DNA or during library preparation. 
The library preparation protocol used can incorrectly process these 
secondary DNA structures and inadvertently introduce one or more 
erroneous variants. For every variant the standard deviation (s.d.) and 
median absolute deviation (MAD) of the variant position within the 
read was separately calculated for positive and negative strand reads. 

Inthe case that the variant was supported by alow number of reads for 
a particular strand, the filtering was based on the statistics determined 
fromthe reads derived from the other strand. It was required that either 
i) 90% of supporting reads reported the variant within the first 15% of the 
read, as calculated from the alignment start; or ii) the MAD exceeded O 
and thes.d. exceeded 4. Inthe case that sufficient reads supporting the 
variant were available for both strands it was required for both strands 
separately that i) <90% of supporting reads reported the variant within 
the first 15% of the read as calculated from the alignment start; ii) the 
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MAD exceeded 2 and thes.d. exceeded 2; or iii) at least one strand fulfilled 
the criteria of a MAD greater than 1and s.d. greater than 10. 

3) Comparison with an independent panel. To remove variant calls 
at badly mapping sites, we compared variant calls in the sequenced 
samples of each donor biopsy with samples from all unrelated donors 
in our cohort. For each variant site we expected the reference base to 
be dominant and conversely expected badly mapping sites to contain 
frequent non-reference base counts. Thus, we counted the numbers 
of A,C,G, Tindel calls at each variant site across all unrelated samples, 
resulting in a large ‘pileup’ table. The dominance of the reference base 
was evaluated at each variant site using the entropy purity metric F: 


E=-) P(x)InP(x;) 


in which xis the count of base ie{A,C,G,T} and the P(x,) are the fractions 
of base calls. Values of F close to O indicate that almost all reads in the 
independent panel contain a single base. Higher values of F indicate 
a mix of base calls at the site. To identify an optimal threshold of F for 
the filtering of variant sites, we evaluated the entropy metric against a 
labelled dataset of variant calls. Specifically, during the clustering of 
variants using the Bayesian Dirichlet process (described below), we 
identified clusters that had variants with low allele frequency present 
inall dissections from the same donor. Manual inspection showed that 
such variants occurred at badly mapping sites. Thus, we labelled variant 
sites in those clusters as ‘badly mapping’ and were able to use the area 
under the receiver operator curve (AUC) to identify a threshold value 
E;,, of 0.16; this allowed us to separate the two labelled variant groups 
with an AUC of 0.99. 


Bayesian Dirichlet process for clustering VAFs across multiple 
samples 

We extend the model previously developed for clustering VAFs of muta- 
tions called ina single sample” to mutation data across multiple samples 
from the same individual. In normal somatic cells, the vast majority of 
the genome retains its normal, diploid copy number, which means that 
we can cluster the VAFs directly (excluding mutations on the X and Y 
chromosomes in males). This has the considerable advantage that the 
Dirichlet process model we build can rely directly on conjugate prior 
distributions. The model includes a potential split-merge step at each 
cycle of the Gibbs sampler, following a previously described Metrop- 
olis—Hastings proposal for conjugate distributions®. The algorithm 
could be extended to include a correction for different copy-number 
states in given samples for a particular mutation through, for example, 
aMetropolis—Hastings update, but at considerable computational cost. 
The full mathematical development of the model is detailed in the Sup- 
plementary Methods. 

We ran the Gibbs sampler for 15,000 iterations, dropping the first 
10,000 as a burn-in. We used the ECR algorithm”, implemented in the 
R package label.switching, to resolve the label-switching problem asso- 
ciated with mixture models. We dropped clusters that contained more 
than100 variant sites. 


Construction of phylogenetic trees 

Phylogenetic trees were constructed manually using the pigeonhole 
principle, as described previously”. In brief, each cluster that was identi- 
fied using the Bayesian Dirichlet process represented a branch of the 
phylogenetic tree. Nesting of trees was identified with three different 
levels of certainty, illustrated ona pair of branches, A and B. 1) Inthe case 
that the median VAFs of A and B exceeded 100%, the pigeonhole principle 
defines that A and B are nested. 2) We can assume that non-hepatocyte 
cells constitute a sizeable fraction of each LCM sample. Assuming a 
non-hepatocyte fraction of 30%, we nested branches when the VAFs of 
Aand B exceeded 70%. This non-hepatocyte fraction was chosen as a 
conservative estimate of the fraction of cells intermixed in our microdis- 
sections that are not derived from the hepatocyte clone, onthe basis of 


observed VAF peaks in our data together with single-cell RNA sequencing 
data from liver tissue. 3) If identical LCMs are members of both A and 
B, it is highly likely that A and B are nested, rather than independent 
branches. Thus, we also nested branches in cases in which the LCMs in 
one branch were a subset of the LCMs in the other (parental) branch. 

For each nesting scenario, we defined the parental branch as the one 
with the higher median VAF in the contained LCMs. We highlighted the 
evidence level for nesting in each representation of phylogenetic trees, 
marking branches with evidence level 1 with a solid line, level 2 witha 
dashed line and level 3 with a dotted line. 


Analysis of driver variants 

We curated a list of genes that have been found to be significantly 
mutated in liver cancers ina selection of published studies’ **””””, as 
shown in Supplementary Table 5. Using the VAGrENT annotations™, we 
counted any regulatory, missense, nonsense, frameshift or essential 
splice variant as a potential driver variant. To systematically identify 
genes under mutagenic selection, we used the dN/dS method”, which 
screens for genes with an excess of non-synonymous mutations com- 
pared to that expected from the synonymous mutation rate. 


Sensitivity correction 

We identified 138 pairs of LCMs with a midpoint-to-midpoint distance 
of <500 pm and at least one shared cluster according to the Bayesian 
Dirichlet process. These LCMs we assumed to represent the same clone, 
thus providing an opportunity to calculate the sensitivity of calling a 
variant present in one LCM in the other. If we assume the sensitivity is 
the same in both samples, then the maximum likelihood estimate for 
the sensitivity, when mutations not called in either sample are unob- 
served, is given by: 


_ 2n 
n,+2n, 


in which n, is the number of variants called in both LCMs in each pair 
and n, is the number of variants called only in one of the two LCMs. To 
evaluate the relationship of sensitivity with depth of coverage and VAF, 
we performed a logistic regression of sensitivity against these two pre- 
dictors using the Im() function of the R programming language. The 
model fit was then used to calculate sensitivity for any LCM sample, 
given the coverage and VAF of the sample. 


Analysis of mutational burden 

We used a linear mixed effects model to fit the number of variants 
per LCM sample against the disease aetiology (normal or cirrhotic) 
and age for each individual. We defined the ID of the individual as a 
random effect. The slope of the age coefficient was allowed to vary 
with the random effect. To facilitate the analysis, we used the Imer() 
function within the Ime4 package of the R programming language. To 
determine the significance of the aetiology and age coefficients, we 
used an analysis of variance (ANOVA) to performay’test that compared 
our model with models omitting the aetiology and age coefficients, 
respectively. 


Targeted deep sequencing validation of mutation calls 

For 96 of the microdissections sequenced by whole-genome sequencing, 
we performed a targeted deep sequencing validation using an Agilent 
RNA bait set that covered 350 recurrently mutated cancer genes. Among 
these genes, a total of 17 mutations were identified in the whole-genome 
sequencing data from the 96 samples; of these, 16 (94%) were validated, 
at comparable VAFs, in the targeted deep sequencing data. 


Calling of indels 
Indels were called using cgpPindel*®. Variant calls for bulk sequenc- 
ing data of the cancer samples were not further filtered. To remove 
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artefactual calls from the LCM-derived data, we performed two post- 
filtering steps. 

1)Assignment to SNV-based clusters. We evaluated how well the VAF 
distribution of each indel across the LCMs from the same donor com- 
pared with the VAF distribution of each SNV-based cluster as identified 
by the Bayesian Dirichlet process. Given an indel in one LCM sample, 
we thus counted its occurrence in all related LCMs and assigned the 
resulting VAF profile to the VAF profiles of the SNV clusters using a Bayes’ 
classifier. We noticed that many indels were assigned to SNV clusters 
with more than 100 variants, which we had previously removed from 
the SNV analysis. On closer inspection we noticed that those INDELs 
had low VAF and occurred frequently in badly mapping regions. We thus 
discarded indels that were assigned to those clusters. 

2) Filtering on the basis of beta-binomial overdispersion parameter. 
We noticed that many indels occurred with low VAF in a large number 
of LCMs from the same donor and were, thus, probably artefactual. To 
systematically identify such indels, we fitted the beta-binomial distribu- 
tion to the variant counts of each indel across the LCMs from the same 
donor. The fitted parameter p (the overdispersion parameter) was used 
to filter indel calls. A high value for parameter p (overdispersion) occurs 
when some LCMs have many variant read counts and others few or none. 
Conversely, alow value occurs when all LCMs have a similar number of 
variant counts (no overdispersion). On the basis of manual inspection, 
we removed variant calls with p < 0.02. 


Calling of copy numbers 

Copy numbers were called using the ASCAT algorithm”, assuming an 
expected ploidy of 4 (to allow for physiologically polyploid hepatocytes) 
and 60% non-hepatocyte cell contamination for all samples. Testing 
of robustness around these starting points (different expected ploidy 
or purity values) found that the specific values used did not materially 
affect the output. Variant calls for bulk sequencing data of the cancer 
samples were not further filtered. To remove artefactual variants from 
the LCM-derived data, we used the SNV-based phylogenetic information. 
The genome was segmented into 500-bp bins and the ASCAT-based copy 
number of each bin was calculated. Using the binned copy-number data 
we calculated the median copy number in each LCM sample and ASCAT 
event. For each ASCAT event and LCM sample we assigned its absolute 
deviation from the diploid state. We compared the copy-number profile 
for each ASCAT event across the LCM samples with the VAF profile of 
each SNV cluster using cosine similarity (described below) to identify 
the most similar SNV cluster. Within each SNV cluster we proceeded to 
merge overlapping ASCAT events. Using manual inspection, we decided 
to keep ASCAT events if 1) they hada cosine similarity of <O.1to an SNV 
cluster; and 2) their assigned SNV cluster was not removed during SNV 
analysis owing to having more than 100 assigned SNVs. 


Calling of structural variants 

Structural variants were called using the BRASS algorithm” (https:// 
github.com/cancerit/BRASS). Variant calls for bulk sequencing data 
of the cancer samples were not further filtered. To remove artefactual 
variants from the LCM-derived data, we used post-processing filters. 
Manual inspection of the sequencing reads identified for each structural 
variant showed that many reads were identical except for frameshifts 
at repetitive sites. We decided that such reads represented duplicates 
and designed a filter to systematically remove these. We removed 
structural variants that were supported by more than two reads after 
removal of duplicates. Each remaining structural variant call was manu- 
ally inspected. 


Calculation of clone size 

We determined the midpoint coordinates of each LCM manually from 
the microscopy images collected during dissection. For each LCM that 
belonged to a clone as determined by the Bayesian Dirichlet process, 
we used the chull function of the R programming language to identify 


the coordinates of the convex hull that included all LCMs. We identified 
the midpoint of each polygon as the average coordinate of all convex 
hull vertices. The size of the clone was then assigned to be the Euclid- 
ean distance between each convex hull vertex and the midpoint of the 
polygon. For clones that only consisted ofa single LCM, we assigned the 
minimum clone size discovered across all clones. 


Extraction of mutational signatures from SNV contexts using 
HDP 

Mutational signatures were extracted using the HDP package (https:// 
github.com/nicolaroberts/hdp), relying on the Bayesian hierarchical 
Dirichlet process. The units of signature extraction were mutations 
assigned to individual branches of the phylogenetic tree, grouped per 
patient, from the LCM data. In addition, to provide a comparison against 
signatures extracted in HCCs, we added catalogues of somatic substi- 
tutions from 54 whole genomes sequenced by TGCA, analysed using 
the same core algorithms as were used for the LCM data. The tool was 
used without defining prior signatures. As hyperparameters we set a 
and f to 6 for the a clustering parameter. Extraction was started with 
40 data clusters (parameter ‘initcc’). The Gibbs sampler was run with 
10,000 burn-in iterations (parameter ‘burnin’). With a spacing of 50 
iterations (parameter ‘space’), 50 iterations were collected (parameter 
‘n’). After each Gibbs sampling iteration, three iterations of concen- 
tration parameter sampling were performed (parameter ‘cpiter’). The 
resulting signatures were compared to published signatures”’* using 
the cosine similarity metric described below. Extracted signatures with 
cosine similarity >O0.9 compared to a known signature from either the 
COSMIC” or PCAWG* catalogue of signatures were assigned the name 
of the known signature with the highest similarity. Extracted signatures 
with cosine similarity <O.9 compared to any of the known signatures were 
assigned new names, which were indexed with the letters A, Band C. 


Extraction of mutational signatures from SNV contexts using 
SigProfiler 

We used SigProfiler to extract mutational signatures, relying on the 
non-negative matrix factorization method™. In particular, we report 
the ‘Decomposed Solution’ output by the SigProfiler package. 


Cosine similarity calculation 


To compare two vectors A and B, cosine similarity was calculated as 
follows: 


Lia ABi 


Similarity = 


Analysis of the proportion of indels and gene expression 

A list of transcribed regions was retrieved from Ensembl using the 
BioMaRt package®. We identified the subset of indel and SNV variants 
that overlapped with the transcribed regions. The proportion of indels 
in comparison to the total number of indels and SNVs per gene was cal- 
culated. Gene expression was assigned using the liver dataset from the 
Genotype-Tissue Expression project (GTEx)”. To test for the relationship 
between gene expression and the proportion of indels, we fitted a Pois- 
son regression using the glm function of the R programming language. 
We modelled the number of indels per gene against an offset of the total 
number of variants per gene and the expression of the gene. 


Analysis of T>C transcriptional-strand bias at transcription start 
sites 

We performed this analysis ina similar way toa published approach”. In 
brief, we retrieved the genomic coordinates of transcription start sites of 
all the highly expressed genes in the liver from GTEx**. We tiled the 10 kb 
upstream and downstream of the transcription start site into 1,000-bp 
bins. We overlapped all T>C (transcribed) and A>G (non-transcribed) 
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variant calls with the tiled regions and summed the number of variants 
ineach tile across all included genes. We also extracted the number of T 
and A bases in each tile. To test whether the strand bias was significant 
only intranscribed regions, we fitted a Poisson regression for the number 
of variant calls against the following predictors: strand (transcribed, 
non-transcribed), distance from transcription start site (O for upstream, 
1 for downstream) and aetiology (cirrhosis, no cirrhosis), and used the 
number of T and A bases in each tile as the offset variable. 


Analysis of C>A and T>A transcriptional-strand bias 

We used the MutationalPatterns package” to assign the transcription 
state for each C>A variant. We retrieved the genomic coordinates of all 
transcribed regions from Ensembl using the BioMaRt package* and 
extracted the frequencies of C and G nucleotides in these regions. To 
test for the significance of transcriptional-strand bias, we performed 
a Poisson regression for the number of C>A variants in each sample 
and transcription strand against factor variables for the transcription 
strand, the patient ID and an interaction term for the two factors. We used 
the C and G nucleotide frequencies as an offset variable. To test for the 
significance of transcriptional-strand bias for a given donor, we coded 
the patient ID ina binary fashion: ‘I’ for the target donor, ‘0’ otherwise. 
We proceeded to test for transcriptional-strand bias of T>A variants in 
a similar way, using A and T nucleotide frequencies as the offset. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

Whole-genome sequencing data across the samples reported in this 
study have been deposited in the European Genome-Phenome Archive 
(https://www.ebi.ac.uk/ega/home) inthe form of BAM files, with acces- 
sion number EGAD00001004578. Substitution and indel calls have been 
deposited on Mendeley Data with the identifier https://doi.org/10.17632/ 
ktx7jp8sch.1 (‘Somatic mutations and clonal dynamics in healthy and 
cirrhotic human liver’). 


Code availability 

Single-nucleotide substitutions were called using the CaVEMan algo- 
rithm, v.1.11.2 (https://github.com/cancerit/CaVEMan). Small insertions 
and deletions were called using the Pindel algorithm, v.2.2.2 (https:// 
github.com/genome/pindel). Rearrangements were called using the 
BRASS (breakpoint via assembly) algorithm v.5.4.1 (https://github.com/ 
cancerit/BRASS). Miscellaneous scripts for downstream analysis are 
available on Github (https://github.com/sfbrunner/liver-pub-repo). 
The analysis of mutational signatures was performed using the HDP 
hierarchical Dirichlet process package v.0.1.5, which is available on 
Github (https://github.com/nicolaroberts/hdp). 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Sensitivity analysis of SNV calls. a, Overview schematic 
of the experimental and analytical approach. b, Examples of the VAFs of variants 
from unrelated (top) and related (bottom) pairs of microdissection samples 
from four donors (left to right). The x axis represents the VAF of sample1 from 
each pair and the yaxis represents the VAF of sample 2. Each dot represents one 
variant. Red, variants called in both samples; yellow, variants called insample 1; 
blue, variants called in sample 2.c, Histogram of sensitivities calculated for each 
sample pair. d, Heat map of modelled sensitivity at different values of VAF and 
coverage. The overlaid dots represent the sample pairs that were used to fit the 


model. e, Relationship of VAF, sensitivity and coverage according to the fitted 
model of sensitivity. The overlaid dots represent the sample pairs that were used 
to fit the model. f, Comparison of calculated (x axis) and fitted (y axis) sensitivity 
for each sample pair (n=34 pairs of samples). The R? value is the Pearson’s 
correlation coefficient. g, Proportion of hepatocytes that are multinucleated in 
the samples analysed here, estimated by counting 500 cells in each H&E-stained 
section (n=14 patients). Each point represents the proportion for one patient in 
the study. The horizontal bars represent the mean for that aetiological group. 


a PD36713b_l0013 


rn) 


CN 


2 JR ena riven enjoin pmaniy Pi iron oy pm ry yn phy aN eR PY 


CPHPOPP PAS PGS CHP CPG PP Pg 


CN 


DH HAM HD AHADSL HD OHH DOM 
SP PPP PPP PPG $ 


LC CHH CHO HO HHA PWS THVW HWM YD AVY 
£ Pr OK POG S PP OPS WP PHPAMAMP CPOHPOM YD AKG 


c PD37107b_10003 d PD37110b_10027 
ACVR2A deletion 


num. BEER ae ee ali .e 


chr2 position (Mb) chr9 position (Mb) — chr10 position (Mb) chr12 position (Mb) 
e PD37111b_lo006 f PD37115b_lo010 


ACVR2A deletion —_ Inversion causing deletion of CDKN2A 


i) 50 100 150 200 0 50 100 i) 50 100 150 
chr2 position (Mb) chr9 position (Mb) chrX position (Mb) 
g PD37111b_lo015 h PD37116b_l0031 
3 
2 
1 
0 
0 20 40 60 80 100 120 140 0 50 100 150 0 50 100 150 0 20 40 60 80 120 
chr8 position (Mb) chr4 position (Mb) chr5 position (Mb) chr10 position (Mb) 
Extended Data Fig. 2| Copy-number and structural variants in chronic liver chromosome, corrected to show overall copy number. Lines and arcs represent 


disease. a, b, Genome-wide copy-number profiles for two samples. Black points individual structural variants, coloured by the orientation of the joined ends 
represent the read depth of discrete windows alongthechromosome, corrected _ (purple, tail-to-tail inverted; brown, head-to-head inverted; turquoise, tandem- 
to show overall copy number. Arm-level and whole-chromosome gains and duplication-type orientation; green, deletion-type orientation). Events that 
losses are evident. c-h, Focal copy-number changes and structural variants. affect known HCC genes are marked with labelled arrows (c,e, f). 

Black points represent the read depth of discrete windows along the 
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Extended Data Fig. 3 | Events that affect known HCC genes in the cohort. 

a, Distribution of somatic point mutations in individual microdissections 
(xaxis) affecting known HCC genes (y axis), coloured by class of mutation 
according to the key underneath the panel. TERTp, TERT promoter. b, Genomic 
position of SNVs (top; light-blue strip) and indels (bottom; dark-blue strip) 
detected in ALB, the gene encoding albumin. c, Relationship of gene expression 


Sample clusters 


mRNA expression, log10 


in liver tissue (x axis) and the proportion of indels as a fraction of all point 
mutations (y axis). The grey line represents a Poisson regression model witha 
significant (two-sided likelihood ratio test; P< 10°) coefficient for gene 
expressionasa predictor for the ratio of indels (n=5,458 genes included inthe 
model). The grey ribbon represents the 99% confidence interval of the 
parameter estimates. 
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Extended Data Fig. 4| See next page for caption. 
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Extended Data Fig. 4 | Phylogenetic reconstruction of hepatocyte clonesin 
non-cirrhotic liver samples. Left, heat maps representing the clustering of the 
variants observed in each microdissection sample (x axis) of the non-cirrhotic 
livers. Each cluster (yaxis) contains mutations for which the VAFs across samples 
are very similar. The colour scale of the boxes represents the estimated mean 
VAF for that cluster in that sample. Middle, phylogenetic trees constructed from 
the clustering information. Solid lines indicate that nesting is in accordance with 
the pigeonhole principle; dashed lines indicate that nesting is in accordance 
with the pigeonhole principle, assuming that hepatocytes represent 70% of 


cells; dotted lines indicate that nesting is only based on clustering (aclone is 
assigned as nested ifits constituent LCMs area subset of LCMs in the parental 
clone). For details, see Supplementary Methods. Right, representation of clones 
according to the physical coordinates of the LCM samples, overlaid onto H&E- 
stained sections (top). Sections stained with Masson’s trichrome and Oil Red O 
are also shown (bottom). Locations of immune or inflammatory cell infiltrates 
are marked with yellow rings. Sample sizes: PD36713, n=30 microdissections; 
PD36714, n=35 microdissections; PD36715, n=26 microdissections; PD36717, 
n=42 microdissections; PD36718, n=32 microdissections. 
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sections (bottom left); and macroscopic photographs of the liver, with HCCs microdissections. 
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Extended Data Fig. 7 | Mutation spectra for individual microdissections. 
From each donor, we chose five clones to represent the heterogeneity in 
mutation spectra in the trinucleotide context. The six types of substitution are 


labelled across the top. Within each panel, the contributions fromthe 
trinucleotide context (bases immediately 5’ and 3’ of the mutated base) are 
shown. 
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Extended Data Fig. 8 | Details of the extraction of mutational signatures. negative matrix factorization. The six substitution classes are separated by grey 
a, Dot plots showing the concordance for signature attributions between the vertical lines, and are presented in the following order: C>A, C>G, C>T, T>A, T>C, 
two signature algorithms (n= 479 microdissections). Mutational signatures on T>G. Within each class of mutation, the contributions from the trinucleotide 
the yaxis were extracted using non-negative matrix factorization and those on context (bases immediately 5’ and 3’ of the mutated base) are shown. 

the x axis were extracted using a Bayesian hierarchical Dirichlet process. The c, Signatures extracted by the Bayesian hierarchical Dirichlet process, as for b. 
Rvalues are Pearson’s correlation coefficients. b, Signatures extracted by non- Where a signature matches one fromb, it is shown onthe same row. 
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Extended Data Fig. 9 | Transcriptional-strand bias in patterns of mutations. base is the guanine, as expected for polycyclic aromatic hydrocarbons. c, Bar 


a, Transcriptional-strand bias of T>C mutations at the ATD context before and plots representing the numbers of T>A variants on the transcribed and non- 
after the transcription start site (TSS) of highly expressed liver genes. b, Bar transcribed strands. Each hepatocyte clone is represented individually (x axis). 
plots representing the numbers of C>A variants on the transcribed and non- Note the strand bias in the highly mutated clones of PD37107, in which the 
transcribed strands. Each hepatocyte clone is represented individually (x axis). aristolochic acid signature is most active; the strand bias indicates that the 
Note the strand bias in the highly mutated clones of PD37111, in which the damaged base is the adenine. 
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Extended Data Fig. 10 | See next page for caption. 


Extended Data Fig. 10 |MutationsinaBlymphocyte clone ina cirrhotic liver. 
a, Illustration of a portion of the B cell receptor (/GH) region on chromosome 14. 
Shown are the coverage tracks of an LCM sample that does not belong to the 
lymphocyte lineage (top) anda sample that belongs to the lymphocyte lineage 
(middle). Inthe centre of the displayed region there is a drop of copy number in 
the lymphocyte track, which indicates a structural rearrangement. The bottom 
track shows the paired-end reads that contribute toa rearrangement event inthe 
lymphocyte sample, colocalized with the drop in copy number. b, Application of 
the pigeonhole principle: if two clusters of heterozygous mutations in regions of 
diploid copy number are in different cells, then their median VAFs must sum to 
<0.5 (ifthey sum to >0.5, equivalent to acombined cellular fraction of >1, then 


there must be some cells that carry both sets of mutations—hence one cluster 
would have a subclonal relationship with the other). Cluster 10 is the cluster with 
the unique VDJ rearrangement of /GH that is shown ina and the large number of 
mutations attributed to signature 9. Clearly, samples from clusters 2, 11,55 and 
so on have VAFs which, when combined with cluster 10, sum to >0.5. Therefore, 
they must be subclonal to cluster 10, even though they do show signature 9. 
c-h, Representative pairwise decision graphs for clusters of mutations. The 
median cellular fraction is shown for pairs of clusters across every sample from 
the patient. Where at least one sample falls above or to the right of thex+y=1 
diagonal line, those two clusters must share a nested clonal-subclonal 
relationship. 
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chronic liver disease who had a synchronous hepatocellular carcinoma, meaning that patients were at advanced stages of 
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Multicellular organisms have co-evolved with complex consortia of viruses, bacteria, 
fungi and parasites, collectively referred to as the microbiota’. In mammals, changes in 
the composition of the microbiota can influence many physiologic processes 
(including development, metabolism and immune cell function) and are associated 
with susceptibility to multiple diseases”. Alterations in the microbiota can also 
modulate host behaviours—such as social activity, stress, and anxiety-related 
responses—that are linked to diverse neuropsychiatric disorders®. However, the 
mechanisms by which the microbiota influence neuronal activity and host behaviour 
remain poorly defined. Here we show that manipulation of the microbiota in antibiotic- 
treated or germ-free adult mice results in significant deficits in fear extinction learning. 
Single-nucleus RNA sequencing of the medial prefrontal cortex of the brain revealed 
significant alterations in gene expression in excitatory neurons, glia and other cell 
types. Transcranial two-photon imaging showed that deficits in extinction learning 
after manipulation of the microbiota in adult mice were associated with defective 
learning-related remodelling of postsynaptic dendritic spines and reduced activity in 
cue-encoding neurons in the medial prefrontal cortex. In addition, selective re- 
establishment of the microbiota revealed a limited neonatal developmental windowin 
which microbiota-derived signals can restore normal extinction learning in adulthood. 
Finally, unbiased metabolomic analysis identified four metabolites that were 
significantly downregulated in germ-free mice and have been reported to be related to 
neuropsychiatric disorders in humans and mouse models, suggesting that microbiota- 
derived compounds may directly affect brain function and behaviour. Together, these 
data indicate that fear extinction learning requires microbiota-derived signals both 
during early postnatal neurodevelopment and in adult mice, with implications for our 


understanding of how diet, infection, and lifestyle influence brain health and 
subsequent susceptibility to neuropsychiatric disorders. 


Pavlovian fear conditioning is an evolutionarily conserved associative 
learning process thatis critical for the survival of an organism and for the 
ability to respond appropriately to neutral stimuli that reliably predict 
dangerous or aversive outcomes’. In the classical fear conditioning 
paradigm, extinction learning occurs when repeated cue presentations 
are no longer paired with an unconditioned stimulus (suchas a foot 
shock) and the organism learns to modify its behaviour accordingly. 
Deficits in extinction learning after an environmental threat has passed 
have been implicated in multiple neuropsychiatric disorders, including 
post-traumatic stress disorder and other anxiety disorders’. Clinical and 
epidemiological studies have reported correlations between changes in 
the microbiotaand other neuropsychiatric disorders® ®. Animal studies 
indicate that the absence or modification of the intestinal microbiota 


affects neurogenesis’, cortical myelination”®, the function of the blood- 
brain barrier” and maturation of microglia”, as well as social behaviour, 
stress-related responses and fear learning?”*"*. However, there are con- 
flicting reports on how the microbiota influence behaviour”, and the 
mechanisms through which the microbiota regulate associative learning 
and its neurobiological substrates remain unclear. 


Lack of microbiota impairs extinction learning 


To test whether the microbiota influence fear conditioning and extinc- 
tion, we first treated adult mice with antibiotics (termed ABX mice)” 
and used a classical cued fear conditioning and extinction learning 
paradigm’®. ABX mice and control mice showed comparable food and 


Alist of affiliations appears at the end of the paper. 
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water intake and weight gain (Extended Data Fig. 1la—c). The bacterial 
burden was 600-fold lower in ABX mice than in control mice (Extended 
Data Fig. 1d), and 16S ribosomal DNA (rDNA) sequencing revealed an 
antibiotic-induced shift in bacterial community structure (Extended 
Data Fig. le-g). Following fear conditioning, ABX mice displayed equiva- 
lent freezing behaviour to control mice, indicating that their acquisition 
of fear conditioning was normal (Fig. 1a). Extinction learning reduced 
conditioned freezing in control mice"®. By contrast, extinction learning 
was significantly impaired in ABX mice (Fig. 1b, c). To further examine 
the influence of the microbiota on extinction learning, we performed 
a similar cued fear conditioning and extinction learning assay in adult 
germ-free (GF) mice. To maintain the microorganism-free status of the 
GF mice, we used a modified single-session fear extinction protocol”. 
Again, both ABX and GF mice exhibited impaired extinction learning 
(Fig. 1d, e). These data show that signals derived from the microbiota 
are necessary for optimal extinction of conditioned fear responses. 
The vagus nerve is one route of neuronal communication between 
the intestine and the brain?°”". We investigated whether the vagus 
nerve is involved in extinction learning deficits following manipula- 
tion of the microbiota by carrying out surgical vagotomy on adult mice. 
Vagotomized ABX mice exhibited similar deficits in extinction learning 
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Fig.1|ABX and GF mice are less prone to fear extinction. a—c, Acquisition of 
fear conditioning (FC) (a), fear extinction over the course of three days or 
sessions (b) and fear conditioning after three days (c) in control (Ctrl) and ABX 
mice. S, session; T, tone. Data pooled from two independent experiments. n=30 
mice per group. Mean +s.e.m.; unpaired two-sided ¢-test (a, c); for b, area under 
the curve (AUC) was calculated for each mouse within each group followed by 
unpaired two-sided t-test between groups. d, e, Fear extinction in control versus 
ABX (d) or control versus GF (e) mice in the single-session 30-tone fear 
extinction assay. Data pooled from two independent experiments, n=12 mice 
per group. Mean +s.e.m.; the AUC was calculated for each mouse within each 
group followed by unpaired two-sided t-test between groups. f, Principle 
component analysis (PCA) of mouse mPFC transcriptome after fear extinction. 
n=4mice per group. Permutational multivariate analysis of variance 
(PERMANOVA): F=5.00, Df=1, P=0.027. g, Volcano plot of differential 
expression of genes in ABX versus control groups inf. Red dots, differentially 
expressed genes (DESeq2 Wald test, false discovery rate (FDR) < 0.1). FC, fold 
change. h, Heat maps showing the top 50 most significantly downregulated or 
upregulated genes in g. Low-expression genes with mean normalized counts in 
the bottom 20th percentile were excluded. i,j, STRING network visualization of 
the genes inh. Edges represent protein-protein associations. Disconnected 
nodes were excluded. k, Significantly enriched KEGG pathways based onall 
differentially expressed genes ing. I-o, Immunofluorescence (IF) staining (I, n) 
and the density of c-FOS* neurons (m, 0) inthe BLA (I, m) or IL (n, 0) of control 
and ABX mice after fear extinction session 3. Data are representative of two 
independent experiments. n=4 mice per group. Mean +s.e.m.; unpaired two- 
sided t-test. PL, prelimbic. Scale bar, 200 um. 


to sham-operated ABX mice, suggesting that the extinction learning 
deficits in ABX mice are independent of the vagus nerve (Extended Data 
Fig. 2). 

Given that microbiota-derived signals can regulate the immune system 
and that immune cells can influence brain function and behaviour”, 
we tested whether the extinction learning deficits were associated with 
alterations in immune responses in the brain. Compared to control mice, 
ABX and GF mice showed no differences in the percentages and numbers 
of CD45"" leukocytes in the brain (Extended Data Fig. 3a, b, e), and no 
differences in the percentages of CD4* T cells, CD8* T cells, CD19* B 
cells, CD11c* dendritic cells, F4/80* macrophages or Ly6C"®" monocytes 
(Extended Data Fig. 3c, d, f-j). Moreover, Rag1 mice, which lack adap- 
tive immune cells, exhibited normal extinction learning (Extended Data 
Fig. 3k), whereas GF RagI’ mice showed deficits in extinction learning 
(Extended Data Fig. 31), indicating that the adaptive immune system is 
not required for extinction learning deficits in ABX and GF mice. 

Given that deficits in extinction learning appear to occur indepen- 
dently of changes in the immune system, we investigated their neuro- 
anatomical basis. We performed genome-wide transcriptional profiling 
of the medial prefrontal cortex (mPFC), an area of the brain known to 
be crucial for extinction learning”, from adult ABX and control mice. 
mPFC tissue dissected from adult ABX and control mice that had not 
undergone fear conditioning and extinction exhibited comparable 
transcriptomes (Extended Data Fig. 4a, b). However, extinction learning 
led to significant differences in the transcriptomes of mPFC tissue from 
ABX and control mice (Fig. 1f, g). Search tool for recurring instances of 
neighbouring genes (STRING) analysis depicted networks of interactions 
of the differentially expressed genes (DEGs) between ABX and control 
samples (Fig. 1h-j); Kyoto Encyclopedia of Genes and Genomes (KEGG) 
and Gene Ontology (GO) enrichment analyses identified pathways that 
are associated with neuronal activity, synapse function, CNS matura- 
tion and the regulation of synaptic plasticity and the development of 
postsynaptic dendritic spines (Fig. 1k, Supplementary Table 1). 

To test whether alterations in gene expression associated with these 
neuronal processes were associated with changes in neuronal activity, 
we examined neuronal activity in fear learning circuits by analysing 
c-FOS expression” in the basolateral amygdala (BLA), which is critical 
for encoding and storing conditioned fear memory”, and in the infral- 
imbic region (IL) of the mPFC, which facilitates extinction learning”. 
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Compared to control mice, ABX and GF mice exhibited a higher density of 
c-FOS* neurons in the BLA (Fig. 11, m, Extended Data Fig. 4c, d) and 
lower density of c-FOS* neurons in the IL (Fig. In, 0, Extended 
Data Fig. 4e, f), which is consistent with their deficits in extinction 
learning. 


Neuronal and glial changes in ABX mice 


To define the cell subsets in the mPFC that contribute to the effect of the 
microbiota on extinction learning, we performed single-nucleus RNA 
sequencing (snRNA-seq) of mPFC samples dissected from ABX and con- 
trol mice after extinction learning, and identified 24 cell clusters (Fig. 2a, 
Extended Data Fig. 5a). Changes in the microbiota were associated with 
significant changes in gene expression in multiple clusters (Extended 
Data Fig. 6). Among the neuronal clusters, excitatory neurons (Fig. 2b, 
Extended Data Fig. 6) were more affected than inhibitory neurons 
(including PVALB*TACI’, SST’, VIP* and NPY* subsets) (Extended Data 
Figs. 5b, 6). Astrocytes, myelinating oligodendrocytes and microglia 
also showed changes in gene expression (Extended Data Fig. 6). DEGs 
that were shared across subsets of excitatory neurons (Extended Data 
Fig. 7) and across multiple cell types (Extended Data Fig. 8) were linked 
to synapse-related pathways and calcium signalling pathways (Sup- 
plementary Tables 2,3), whichis consistent with our bulk RNA-seq data 
(Fig. 1k) and supports a model in which gene expression is altered in 
brain-resident cells—including specific cell populations such as excita- 
tory neurons and microglia—following manipulation of the microbiota. 

Given that microglia are important for maintaining neuronal func- 
tion and brain health by dynamically regulating synaptic pruning and 
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surveying their local microenvironments, and have been reported to be 
affected by the microbiota””’, we further investigated the DEGs in micro- 
glia (Fig. 2c). The microglial DEGs were enriched in pathways related to 
synapse organization and synapse assembly (Fig. 2d, Supplementary 
Table 4), suggesting that deliberate manipulation of the microbiota 
may alter microglia-mediated synaptic pruning. In addition, consist- 
ent with the literature”, we found elevated percentages and numbers 
of microglia in GF mice, with elevated expression of CSF1IR and F4/80 
(Extended Data Fig. 9a-d). The percentages and numbers of microglia 
in ABX mice were not changed, with no changes in CSF1R expression, 
but F4/80 expression was elevated (Extended Data Fig. 9e-h). CSFIR 
and F4/80 are strongly developmentally regulated, and their expres- 
sion decreases during maturation”. Together, these data suggest that 
microgliain GF and ABX mice exhibit an immature state reminiscent of 
developing juvenile microglia, which may in turn contribute to deficits 
in extinction learning by disrupting dendritic spine remodelling. 


Defective extinction-related spine remodelling 

Next, we used two-photon laser scanning microscopy to directly quantify 
the remodelling of postsynaptic dendritic spines in the mPFC (Fig. 3a) 
during cued fear conditioning and extinction learning in transgenic 
THY1-YFP-H reporter mice, which express yellow fluorescent protein 
in neurons, following manipulation of the microbiota in adulthood. 
Postsynaptic dendritic spines are membranous protrusions on neuronal 
dendrites that form primarily excitatory synapses with presynaptic 
axonal inputs and are dynamically remodelled during learning and devel- 
opment®””, Fear conditioning and extinction learning are correlated 
with opposing effects on the formation and elimination of dendritic 
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spines in the mPFC*™*. We acquired images of the same dendritic spines 
during a baseline period, and before and after fear conditioning and 
extinction learning (Fig. 3b, c). Compared to control mice, baseline spine 
elimination rates were significantly elevated in ABX mice (Fig. 3d, f), 
whereas baseline spine formation rates were unaffected (Fig. 3e, g). Con- 
sistent with previous findings™, cued fear conditioning and extinction 
learning had opposing effects on spine remodelling in control mice. Fear 
conditioning increased spine elimination rates in control mice (Fig. 3d), 
such that there was no significant difference in spine elimination or 
formation rates in the 24 h after conditioning between ABX mice and 
control mice (Fig. 3d-g). By contrast, extinction-learning-related spine 
remodelling was significantly altered in ABX mice. Extinction learning 
increased spine formation rates in control mice but not in ABX mice 
(Fig. 3e, g), and spine elimination rates remained persistently elevated 
in ABX mice relative to control mice (Fig. 3d, f). 

Consistent with elevated spine elimination in ABX mice, we observed 
comparable expression of the presynaptic marker synaptophysin but 
lower expression of the postsynaptic marker PSD-95 in the mPFC of GF 
mice compared to control mice (Fig. 3h-k). In addition, the expression of 
Dlg4inexcitatory neuron subset 1 (exPFC1) was downregulated in samples 
from ABX mice compared to those from control mice. Together, these data 
indicate that alterationsinthe composition of the microbiota are associated 
with deficits in learning-induced spine plasticity. Notably, plasma corticos- 
teronelevels were comparablein control, ABX and GF mice® (Extended Data 
Fig. 10a, b), indicating that the function of the hypothalamic-pituitary— 
adrenal axis may not be altered and is not a major driver of microbiota- 
mediated changes in spine remodelling and fear extinction learning. 


Defective tone-encoding ensembles in ABX mice 


To investigate whether signals derived from the microbiota regulate 
learning-related neuronal activity in the mPFC, we used two-photon 
imaging and a genetically encoded calcium sensor (AAV5-hSyn- 
GCaMP6s) to quantify neuronal activity during extinction learning in 
control and ABX mice (Fig. 4a—c). We identified mPFC neurons that 
encoded the conditioned stimulus, and identified two differentially 
responding functional cell types. One neuronal population (represent- 
ing 13.5% of all cells) exhibited equivalently reduced activity during tone 
presentations in both control and ABX mice (Fig. 4d, e). By contrast, a 
second population (representing 14.9% of all cells) displayed increased 
activity during tone presentations (Fig. 4f). In this latter population, neu- 
ronal activity during tone presentations was modestly but significantly 
lower in ABX mice than in control mice (Fig. 4g), consistent with the 
deficits in spine formation and behaviour in ABX mice. Notably, 26.8% of 
these neurons also encoded the precise timing of the tones, exhibiting 
tone-locked activity that increased and decreased in response to the 
onset and offset of each tone, respectively (Fig. 4h). Again, tone-locked 
activity in these multicellular tone-sensitive ensembles was significantly 
reduced in ABX mice compared to control mice (Fig. 4i). In conjunction 
with the RNA-seq and dendritic spine remodelling analyses, these data 
indicate that dysbiosis of the gut microbiota disrupts learning-related 
spine formation and interferes with the emergence of multicellular 
tone-encoding ensembles. 


Diverse microbiota restores extinction learning 


To test whether the extinction learning deficits caused by altered 
microbiota can be rescued by colonization with defined individual 
microorganisms or consortia of microorganisms, we performed fear 
conditioning and extinction learning in gnotobiotic mice colonized 
with bacteria that are known to influence other physiologic processes. 
Notably, following colonization with segmented filamentous bacterium 
(SFB)*°, Clostridium species”, Enterobacter species* or altered Schaedler 
flora (ASF), these gnotobiotic mice still exhibited impaired extinction 
learning compared to control mice (Fig. 5a), suggesting that amore 
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Fig. 4 | Defective ensemble calcium dynamics in the mPFC of ABX mice. 

a, Example false-colour image (mean projection over time) of GCaMPé6s- 
expressing neurons in the mPFC. Scale bar, 50 um. b, Segmentation of the 
neurons ina.c, Neuronal activity (AF/F) extracted from the three example 
neurons outlined in b. d-g, Population activity traces (mean AF/F+s.e.m.) for 
neurons exhibiting decreased (d) or increased (f) activity during tone 
presentations in fear extinction session 3. Mean activity (AF/F) during each task 
epoch (baseline, tone-on, tone-off; e) for the neuronal population depicted 

ind presents a significant decrease in activity (repeated-measures analysis of 
variance (ANOVA): main effect of time: F(10,1600) =3.138, P=0.007) but no 
significant difference between groups (group-by-time interaction: 

F(10,1600) =2.736, P=0.1280). NS, not significant (baseline, 0.285; tone-on, 
0.595; tone-off, 0.578). Mean activity (AF/F) during each task epoch (g) for the 
neuronal population depicted inf presents a significant increase in activity 
(repeated-measures ANOVA: main effect of time: F(10,1770) = 4.945, P< 0.0001) 
anda significant group-by-time interaction (F(10,1770) = 3.806, P=0.0008). 
*P=0.013, significant group difference in post-hoc contrast. NS, not significant 
(baseline, 0.128; tone-off, 0.601). Centre line, median; box, 25th and 75th 
percentiles; whiskers, minimum to maximum ine, g. h, Raster plot of neuronal 
activity for cells that encoded the timing of tones by increasing and decreasing 
activity in response to tone onset and offset, respectively. Each row represents one 
neuron. i, Population activity trace (mean AF/F+s.e.m.) for neurons depicted in 

h, timelocked to tone onset and averaged across tones. Repeated-measures ANOVA: 
main effect of time: F(179,8234) = 7.033, P< 0.0001; group by time interaction: 
F(179,8234) = 2.749, P= 0.0093. Data in d-i based on1,204 total cells pooled from3 
independent experiments, from n=7 control mice and n=8 ABX mice. 


diverse microbiota is required for normal extinction learning and fear 
extinction behaviour. 

To investigate whether the extinction learning deficits caused by 
altered microbiota are reversible, we recolonized previously germ-free 
(ex-GF) mice with a complete microbiota from healthy control mice at 
various developmental time points. Both ex-GF mice colonized when 
they were adults (ex-GF_adult) and ex-GF mice colonized at weaning 
age (ex-GF_weaning) still displayed impaired fear extinction compared 
to control mice (Fig. 5b, c), indicating that extinction learning deficits 
were not reversible in GF mice after weaning. However, when ex-GF 
mice were colonized immediately after birth by fostering to microbi- 
ota-replete specific-pathogen-free (SPF) surrogate mothers (ex-GF_ 
fostered mice), their fear extinction behaviour was comparable to that of 
control_fostered mice (Fig. 5d), indicating that extinction learning and 
learning-related plasticity require microbiota-derived signals during a 
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in microbiota-derived metabolites. a—d, Fear extinction in control, GF and 
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critical developmental period before weaning. Lack of the microbiota 
inthe neonatal period, no matter whether microbially colonized or not 
after weaning, renders deficits in fear extinction learning in adulthood 
(Fig. 5b-d). Whereas fear extinction was restored in ex-GF_fostered 
mice, we found no significant shift in the transcriptome of the mPFC of 
ex-GF_fostered mice or control_fostered mice (compared to GF mice) 
after the single-session fear extinction training (data not shown). This 
could reflect the shorter extinction model used in GF mice, which may 
induce smaller transcriptional changes than the protocol involving one 
session per day for three days that was used in ABX mice. Alternatively, 
the lack of transcriptional changes in the GF fostering studies could 
indicate the involvement of other processes such as post-translational 
or epigenomic modifications. 


Altered CSF metabolites and fear extinction 


Next, we investigated whether changes in neuronal development 
and fear extinction learning were associated with alterations in 


microbiota-derived metabolites. We used untargeted comparative 
metabolomics of cerebrospinal fluid (CSF), serum and faecal samples 
from adult GF mice, control_fostered mice and ex-GF_fostered mice 
using high-resolution liquid chromatography and mass spectrometry 
(LC-MS). Using thexcms platform” for comparative analysis of the mass 
spectrometry datasets, we identified four metabolites—phenyl sulfate, 
pyrocatechol sulfate, 3-(3-sulfooxyphenyl)propanoic acid (all phenolic 
compounds) and indoxyl sulfate (Extended Data Fig. 10c)—that were 
significantly decreased in CSF, serum and faecal samples of GF mice 
compared to control_fostered mice, and were restored in ex-GF_fostered 
mice (Fig. Se, f, Extended Data Fig. 10d). Downregulation of the same 
four metabolites was also detected when we compared GF with control 
CSF samples (Extended Data Fig. 10e), as well as in comparisons of ABX 
versus control samples (data not shown). Notably, 3-(3-hydroxyphenyl)- 
3-hydroxypropanoic acid (HPHPA, a derivative of 3-(3-sulfooxyphenyl) 
propanoic acid) and indoxyl sulfate have been associated with neuropsy- 
chiatric disorders such as impaired executive function, schizophrenia 
and autism in humans* *°. Other microbiota-derived phenolic com- 
pounds such as 4-ethylphenylsulfate have been reported to influence 
autism-related behaviours in mice’, suggesting that these microbiota- 
derived compounds may have conserved roles in the development and 
function of neurons in multiple contexts. 


Discussion 


This study informs a model in which alterations in exposure to the 
microbiota in neonatal and adult mice can have considerable and 
long-lasting effects on neuronal function and learning-related plastic- 
ity that subsequently regulate fear extinction behaviour (Extended 
Data Fig. 10f). From bulk RNA-seq and snRNA-seq data, the deficits in 
extinction learning correlate with malfunctions of the mPFC, notably 
in excitatory neurons. Transcranial two-photon live imaging confirmed 
the changes in neurons in the ABX mice (specifically, reduced extinc- 
tion learning-associated spine formation and altered learning-related 
neuronal activity). Given that the vagus nerve does not contribute to the 
extinction learning deficits in ABX mice in this setting, the microbiota 
may affect the CNS through circulating microbiota-derived metabolites, 
directly influencing excitatory neurons inthe mPFC and leading to defi- 
cits in extinction learning. In addition, microbiota-derived metabolites 
may also influence other cell subsets in the mPFC (such as microglia), 
and indirectly affect the excitatory neurons and behaviour. In accord- 
ance with this, and consistent with other studies”, we found that the 
microglia in GF and ABX mice exhibited an immature state reminiscent 
of developing juvenile microglia, which may contribute to elevated 
spine pruning and reduced extinction-learning-associated spine for- 
mation. Future studies are necessary to determine the molecular basis 
of changes to cellular activity in the mPFC induced by alterations in 
microbiota-derived metabolites. 

Insummary, our findings offer one compelling explanation for the 
notable deficits in fear extinction learning in ABX and GF mice, and 
suggest that alterations in microbiota-derived metabolites contribute 
to altered neuronal activity and behaviour. Coupled with the growing 
literature on the influence of the microbiota, diet and lifestyle on chronic 
inflammatory diseases, metabolic health and cancer, the effects of ben- 
eficial microorganisms on brain health and behaviour highlight the need 
to better define the co-evolved relationship between the microbiota, 
the nervous system and mammalian behaviour. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment, except for 
dendritic spine imaging data analysis, in which the raters were blinded 
to experimental conditions. 


Mice 

C57BL/6) (Jax 664), RagI (Jax 2216), Thy1-YFP-H (Jax 3782) and BALB/c 
(Jax 651) mice were purchased from The Jackson Laboratory and bred 
in-house. Male mice were used at 7-16 weeks of age. In individual experi- 
ments, all mice were age-matched. All mice were maintained under SPF 
conditions on a 12-h light-dark cycle, and provided food and water ad 
libitum. Germ-free C57BL/6J mice and gnotobiotic mice were maintained 
at Weill Cornell Medical College, New York. All mouse experiments were 
approved by, and performed in accordance with, the Institutional Animal 
Care and Use Committee guidelines at Weill Cornell Medicine. 


Antibiotic treatment 

Mice were provided autoclaved drinking water supplemented with 
a cocktail of broad-spectrum antibiotics as previously described”: 
ampicillin (0.5 mg/ml, Santa Cruz), gentamicin (0.5 mg/ml, Gemini Bio- 
Products), metronidazole (0.5 mg/ml Sigma), neomycin (0.5 mg/ml, 
Sigma), vancomycin (0.25 mg/ml, Chem-Impex International), and sac- 
charin (4 mg/ml, Sweet’N Low, Cumberland Packing). Sweet’N Low was 
added to make the antibiotic cocktail more palatable. Antibiotic treat- 
ment was started two weeks before the experiments and continued for 
the duration of the experiments. Following antibiotic treatment mice 
exhibited no significant differences in weight gain, food or water intake 
(measured by Promethion metabolic cages) or perception of pain*®. 


Fear conditioning and extinction assays 

Fear conditioning and extinction assays were performed as previously 
described’*”. For fear conditioning, mice were placed in shock chambers 
(Coulbourn Instruments), which were scented with 0.1% peppermint in 
70% EtOH. After 2 min of habituation, mice were fear conditioned with 
3 tone-shock pairings consisting of a 30-s (5 KHz, 70 dB) tone (condi- 
tioned stimulus, CS) that co-terminated with a 1-s (0.7 mA) foot shock 
(unconditioned stimulus, US). Intertrial intervals (ITIs) between each 
tone-shock pairing were 30s. After the final tone-shock pairing, mice 
remained inthe conditioning chambers for 1 min before being returned 
to their home cages. 

For the classical 3-day (that is, 3-session) fear extinction, 24 h after 
fear conditioning, mice were placed in extinction chambers (different 
shape fromthe conditioning chambers), which were scented with 0.1% 
lemon in 70% EtOH. After 2 min of habituation, mice were exposed to 
5 presentations of the tone (CS) in the absence of the shock (US). Each 
tone lasted for 30 s with an ITI of 30s. After the final tone presenta- 
tion, mice remained in the extinction chambers for 1 min before being 
returned to their home cages. Fear extinction sessions were repeated 
daily for three days. 

For single-session 30-tone fear extinction, 20 min after fear condition- 
ing, mice were placed in extinction chambers. After 2 min of habituation, 
mice were exposed to 30 presentations of the tone (CS) in the absence 
of the shock (US). Each tone lasted for 30s with an ITI of 30s. Extinction 
trials were binned into early and late sessions, with the early session 
representing the average of trials 1-15, and late trials representing the 
average of trials 16-30. 

Experiments were controlled by Graphic State software (Coulbourn 
instruments). Mice were video-recorded for subsequent analysis. 


Fear behaviour 
Mouse freezing behaviour was scored automatically using previously val- 
idated MATLAB code for automated phenotyping of mouse behaviour**. 


Per cent time spent freezing (freezing (%)) was calculated by dividing 
the amount of time spent freezing during the 30-s tone presentations 
by the duration of the tone. 


Immunofluorescence staining 
Brain sections were prepared and stained for c-FOS, synaptophysin or 
PSD-95 expression as previously described”. All steps were carried out 
at room temperature unless otherwise specified. Ninety minutes after 
fear extinction session 3, mice were anaesthetized by intraperitoneal 
injection of Euthasol and perfused with PBS followed by 4% paraformal- 
dehyde. Brains were collected, fixed in 4% paraformaldehyde overnight, 
and dehydrated in 30% sucrose at 4 °C. Coronal sections (40 pm) were 
cut using a sliding microtome frozen by powdered dry ice. Six sets of 
serial sections were collected in Eppendorf tubes each containing 2 ml 
cryoprotectant (30% glycerol and 30% ethylene glycol in0.1M sodium 
phosphate, pH 7.4) and stored at —20 °C. Free-floating serial sections (lin 
every 3) were washed in TBS, incubated for 30 min in blocking buffer (4% 
normal horse serum, 1% BSA and 0.2% Triton X-100 in TBS) and incubated 
overnight at 4 °C with rabbit anti-c-FOS primary antibody (sc-52, Santa 
Cruz), or mouse anti-synaptophysin (SAB4200544, Sigma-Aldrich) or 
PSD-95 (7E3-1B8, Sigma-Aldrich) diluted 1:1,000 in the blocking buffer. 
Sections were then washed in TBS and incubated for 2 h with Alexa-Fluor- 
555-labelled donkey anti-rabbit or anti-mouse antibody (Invitrogen) 
diluted 1:500 in TBS with 0.2% Triton X-100. Sections were again washed, 
mounted on chromalum/gelatin-coated slides and air-dried for 2 hin 
darkness. Slides were coverslipped with water-soluble glycerol-based 
mounting medium containing DAPI and sealed with nail polish. 
Estimation of cell densities of c-FOS* neurons in BLA and IL was per- 
formed with Stereolnvestigator 9.0 (MicroBrightField). In brief, serial 
sections (every third section, 120 pm) were numbered by rostra-caudal 
order, and contours of BLA and IL were traced by referring to the Allen 
Brain Atlas (Allen Institute). All cells across all sections per mouse were 
counted. Individual cell density was calculated for each mouse by divid- 
ing the total sampled cell numbers by the total volume of the region. 
For synaptophysin and PSD-95 images, confocal microscopy was 
performed with a Zeiss LSM 880 Laser Scanning Confocal Microscope 
using 63x oil immersion lens. Images were acquired with 2x digital zoom. 
Image stacks were 5 pm in thickness with z-step size of 0.5 um, and were 
analysed using ImageJ software (http://rsbweb.nih.gov/ij/). 


Intracranial window implantation 

Mice were anaesthetized with isoflurane (induction, 5%; maintenance, 
1-2%) and administered dexamethasone (1 mg/kg, i.p.) to reduce brain 
swelling and metacam (2 mg/kg, i.p.) as a prophylactic analgesic. Scalp 
fur was trimmed, and the skull surface was exposed with a midline scalp 
incision. Bupivicaine (0.05 ml, 5 mg/ml) was administered topically as 
a second prophylactic analgesic. A circular titanium head plate was 
positioned over the region to be imaged (1.7 mm anterior to the bregma 
suture and centred over the midline) using dental cement (Metabond).A 
high-speed dental drill (Model EXL-M40, Osada) and 0.5-mm burr were 
used to opena small (about 4-mm) craniotomy. A3-mm round coverslip 
(Warner Instruments) was lowered through the craniotomy to rest ontop 
of the brain using a digital micromanipulator. The window was then fixed 
to the skull using veterinary adhesives (first Vetbond, then Metabond). 


Viral injection 

AAVS5-hSyn-GCaMP6s was obtained from the UPenn Vector Core. Viral 
injection surgeries were performed with mice (8-10 weeks of age) under 
isoflurane anaesthesia (induction, 5%; maintenance, 1-2%) with regular 
monitoring for stable respiratory rate and absent tail pinch response. 
The scalp was shaved, and mice were fixed ina stereotactic frame (Kopf 
Instruments) with non-rupturing ear bars. A heating pad was used to 
prevent hypothermia. A midline incision was made to expose the skull 
and bupivacaine was applied onto the skull for local anaesthesia. Virus 
injections (1,000 nl) were delivered with a 10-pl Hamilton syringe and 
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33-gauge bevelled needle, injected at 100 nl/min using an injection 
pump (World Precision Instruments). Injection coordinates relative to 
Bregma were: 1.7 mm anterior, 0.4 mm lateral, and 1.3 mm ventral. Fol- 
lowing injection, the injection needle was held at the injection site for 
2 min then slowly withdrawn. The skin was then closed using Vetbond 
(3M) and the mice recovered on a heating pad before being returned 
to their home cages. 


Transcranial two-photon imaging 

Dendritic spine imaging was conducted as previously described“. In 
brief, image stacks of dendritic segments were acquired using a two- 
photon laser scanning microscope (Olympus RS) equipped witha scan- 
ning galvanometer and a Spectra-Physics Mai Tai DeepSee laser tuned to 
920 nm, anda 25x, long working distance water-immersion microscope 
objective (NA = 1.05, Olympus). Fluorescence was detected through 
gallium arsenide phosphide photomultiplier tubes using the Fluoview 
acquisition software (Olympus), and images were collected inthe green 
channel using an F30FGR bandpass filter (Semrock). All imaging experi- 
ments began by obtaining alow-magnification z-stack (no digital zoom) 
to aid in relocating the same sites repeatedly over time, in conjunction 
with vascular landmarks and the contours of the prism. For spine imag- 
ing experiments, we acquired z-stacks (512 x 512 pixels, 2-ps pixel dwell 
time, 0.75-1-pm step size) with 3x digital zoom through up to 250 pm of 
tissue in z. Spine imaging experiments occurred under KX anaesthesia 
(ketamine 100 mg/ml and xylazine 10 mg/ml, at dosages of 0.1 ml/10 g 
body weight). For calcium imaging experiments, we acquired time-lapse 
images (512 x 512 pixels, 3 frames per second, about 1,450 frames,) span- 
ning an area of the mPFC measuring approximately 508 pm by 508 pm. 
All calcium imaging experiments were carried out on awake mice. For 
repeated imaging over intervals of days, the procedure above was 
repeated, and the region to be imaged was identified by referring to 
vascular landmarks and the contours of the cranial window. 


Spine imaging analysis 

Spine remodelling dynamics were quantified as previously described®. 
Image stacks were analysed using Image] software (http://rsbweb.nih. 
gov/ij/). Raters blinded to experimental condition compared pairs of 
images of the same dendritic segment and identified stable spines (pre- 
sent in image 1 and 2), eliminated spines (present in image 1 but not in 
image 2) and formed spines (present in image 2 but not in image 1), each 
quantified as a percentage of the total number of spines identified in 
the initial image. Filopodia were defined as dendritic protrusions witha 
length exceeding three times their maximum width and were excluded 
from spine remodelling analyses. 


Calcium imaging analysis 

Preprocessing. We used standard, validated procedures for preprocess- 
ing and analysing calcium imaging time series data. X-Y motion artefacts 
were corrected using the Image] plugin ‘Image Stabilizer’ created by 
K.LiandS. Kang (https://www.cs.cmu.edu/-~kangli/code/Image Stabilizer. 
html). Image time series were segmented into individual cells using 
custom MATLAB scripts based on an established sorting algorithm that 
combines independent components analysis and image segmentation 
based on threshold intensity, variance, and skewness in the X-Y motion- 
corrected dataset*’”. Image segmentation results were manually in- 
spected for quality control. Fluorescence signal time series (AF/F: change 
in fluorescence divided by baseline fluorescence) were calculated for 
each individual neuronal segment: a 40-s sliding window was used to 
calculate the baseline fluorescence for each cell, accounting for both 
differences in GCaMP expression and de-trending for slow timescale 
changes in fluorescence™. 


Analysis. First, we tested for cells exhibiting tone-sensitive activity, 
using repeated-measures ANOVA to identify cells with a statistically 
significant increase (Fig. 4f) or decrease (Fig. 4d) in activity during tone 


presentations, compared to their activity during atwo-minute pre-tone 
baseline period. To estimate statistical significance while accounting 
for autocorrelation in calcium transient time series and correcting for 
multiple comparisons, we repeated this analysis 10,000 times for each 
cell after shuffling the timing of the baseline period and the timing of 
the tone onsets and selected a Pvalue threshold to limit the FDR to less 
than 5%. Next, to test for group (ABX versus control) effects on activity in 
each of these cell populations (Fig. 4e, g), we used a two-factor (time and 
group) repeated-measures ANOVA and post-hoc linear contrasts to 
test for between-group differences in activity during each task epoch 
(baseline, tone on and tone off). Finally, to test for cells that also encoded 
the precise timing of the tones, exhibiting tone-locked activity that 
increased or decreased in response to the onset of each tone, we used 
a procedure analogous to the one described above, using repeated- 
measures ANOVA to test for changes in activity in the tone on versus tone 
off epochs; estimating statistical significance in shuffled data as above; 
and testing for group effects on activity using a two-factor (time and 
group) repeated-measures ANOVA (Fig. 4h, i). 


RNA-seq 

Mouse mPFC was dissected by referring to the Allen Brain Atlas. Coor- 
dinates relative to bregma are: 1.3 to 2.8 mm anterior, -1to1 mm lateral, 
and Oto1mm ventral. RNA was extracted using Trizol (Invitrogen) and 
chloroform and further purified using the RNAeasy mini spin columns 
(Qiagen). RNA-seq libraries were prepared and sequenced by the Epi- 
genomics Core at Weill Cornell Medicine on an Illumina HiSeq 2500, 
producing 50-bp single-end reads. Sequenced reads were demultiplexed 
using CASAVA v1.8.2 and adapters trimmed using FLEXBAR v2.4”. 


RNA-seq analysis 

Sequenced reads were aligned to the mouse genome GRCm38/mm10 
using STAR v2.3.0. Reads counts at the gene level were calculated using 
Rsubread*. Normalization for library size and differential expression 
analysis were performed using DESeq2* v1.18. Only genes with at least 
ten raw reads in each sample were tested for differential expression. 
PERMANOVA* was used to test whether antibiotic treatment accounted 
for a significant portion of the variance in gene expression after fear 
extinction. Specifically, expression of the 500 genes with the highest 
variance (after applying the variance stabilization transformation of 
DESeq2*) was analysed using the adonis function of the vegan R pack- 
age (https://CRAN.R-project.org/package=vegan) using the Euclidean 
metric and 20,000 permutations. Differentially expressed genes 
were used for GO enrichment analysis (http://www.pantherdb.org/)”, 
KEGG analysis (https://www.genome.jp/kegg)** and STRING analysis 
(https://string-db.org/)”. 


Brain-resident immune cell isolation and flow cytometry 
Brain-resident immune cells were isolated using Percoll gradients®. 
Mice were anaesthetized and perfused with ice-cold HBSS. Brains were 
removed, homogenized, resuspended with 30% Percoll, and layered on 
top of 70% Percoll. After centrifugation (SOOg, 30 min), immune cells 
gathered in the 30-70% interphase. 

For flow cytometric analyses, cells were washed, incubated with 
purified anti-mouse CD16/CD32 (clone 93, Biolegend) to block the Fc 
receptors, and then stained with anti-CD45 (clone 30-F11, Biolegend), 
anti-CD4 (clone RM4-5, eBioscience), anti-CD8a (clone 53-6.7, BD Bio- 
sciences), anti-CD19 (clone 1D3, eBioscience), anti-CD11b (clone M1/70, 
eBioscience), anti-CD11c (clone N418, eBioscience), anti-F4/80 (clone 
BM8, eBioscience), anti-LY6G (clone 1A8-Ly6g, eBioscience), anti-LY6C 
(clone HK1.4, eBioscience) and anti-CSF1R (clone AFS98, eBioscience). 
Data were collected ona LSRFortessa cytometer (BD Biosciences) and 
analysed with FlowJo software (Tree Star). Dead cells were excluded 
from analyses based on LIVE/DEAD Fixable Aqua dead cell staining (Inv- 
itrogen). Non-singlet events were excluded from analyses based on 
the side scatter height (SSC-H) versus side scatter width (SSC-W), and 


then the forward scatter height (FSC-H) versus forward scatter width 
(FSC-W) characteristics. 


Microbial colonization 

For ex-GF mice colonized when they were adults (ex-GF_adult), dirty 
bedding from SPF mice was placed in the GF cages of 8-week-old GF mice 
two weeks before the fear conditioning and extinction assay. For ex-GF 
mice colonized when they were weaned (ex-GF_weaning), 3-week-old GF 
mice were co-housed with 3-week-old SPF mice until they were 8 weeks 
old and then subjected to the fear conditioning and extinction assay. 


Fostering of pups 
C57BL/6J GF and control SPF newborn pups were fostered by BALB/c 
mothers until weaning. 


16S qPCR 

DNA was isolated from faecal samples of control and ABX mice using 
the DNeasy PowerSoil kit (Qiagen). Equal amounts of purified faecal 
DNA (4.ng per reaction) were added to qPCR reactions with universal 
16S primers using SYBR green chemistry (UniF340: 5’-ACTCCTACGG- 
GAGGCAGCAGT-3’; UniR514: 5’-ATTACCGCGGCTGCTGGC-3’). 16S DNA 
levels in each sample were normalized to the average of the control 
mouse group. 


16S amplicon sequencing and analysis 

16S rRNA gene sequencing methods were adapted from the methods 
developed for the NIH-Human Microbiome Project (https://nmpdacc. 
org/).In brief, bacterial genomic DNA was extracted using MO BIO Pow- 
erSoil DNA Isolation Kit (MO BIO Laboratories). The 16S rDNA V4 region 
was amplified by PCR and sequenced in the MiSeq platform (Illumina) 
using the 2 x 250-bp paired-end protocol. Raw reads were processed and 
clustered into operational taxonomic units (OTUs) using USEARCH ver- 
sion 11°. Specifically, reads were demultiplexed and read pairs merged, 
witha maximum of five mismatching bases in the overlap region, as well 
as aminimum sequence agreement of 80%. PhiX contaminant sequences 
were removed, and merged sequences were filtered according to FASTQ 
quality scores using amaximum expected error number of 0.1. Filtered 
sequences were clustered into OTUs at a 97% identity threshold using 
the USEARCH cluster_otus command with default settings. Merged 
reads (unfiltered) were mapped to the OTU representative sequences, 
generating an OTU table. Taxonomic classification of OTU representative 
sequences was performed with the USEARCH SINTAX command witha 
confidence score of 0.8, using version 16 of the RDP 16S training set™. 
Diversity estimation and principal coordinates analysis (PCoA) ordina- 
tion were performed using the phyloseq R package” after subsampling 
the OTU table to even depth. 


snRNA-seq 

Nuclei were extracted from four frozen mPFC samples (two from ABX 
mice and two from control mice) with a glass dounce tissue grinder 
set (Millipore Sigma no. D8938) and Nuclei EZ Prep (Millipore Sigma 
no. NUC101-1kt). Each sample was dounced with pestles A and B (24x 
each) in2 ml EZ prep buffer, washed with 5 ml EZ prep, and resuspended 
in 1 ml resuspension buffer (1x PBS, 0.1% BSA, 25 U/ml recombinant 
RNase inhibitor, Takara 2313B). Single nucleus suspensions were strained 
through a 35-um cell strainer (Corning 352235), visually inspected under 
amicroscope, and loaded onto 3’ library chips as per the manufacturer’s 
protocol for the Chromium Single Cell 3’ Library & Gel Bead kit (v.3) 
(10X Genomics 1000092). For each sample, an input of 11,000 nuclei 
was added to each channel. Libraries were sequenced at a mean depth 
of 21,714 reads per nucleus ona HiSeqX. 


snRNA-seq data processing 
Demultiplexed FASTQ files were generated using Cell Ranger v2.0. Reads 
were aligned to the mm10 mouse transcriptome containing pre-mRNA 


annotations, similarly to previously described™, to generate raw gene 
expression matrices (nuclei by genes). Expression matrices across all 
four samples were merged and loaded into Scanpy (version 1.4.0)®. 
Genes found in fewer than three nuclei were filtered out. Nuclei were 
filtered out using the following criteria: fewer than 600 genes (likely to 
be empty droplets), more than 5,000 genes (likely to be doublets), >2% 
of reads mapping to mitochondrial genes, >0.1% of reads mapping to 
caspase genes to remove apoptotic cells. The resulting filtered matrix 
consisted of 38,649 nuclei and 22,451 genes. The filtered gene expression 
matrix was normalized within each nucleus, resulting ina filtered, nuclei- 
normalized matrix X, then log-normalized by calculating In(X+ 1). Before 
selecting variable genes, we masked genes that contain highly repetitive 
regions inintronic regions that result in inflated read counts (PISD, Mylip, 
Gm17660) and highly expressed IncRNAs that affect within-nuclei nor- 
malization (Gm28928, Malatl). We selected 1,535 highly variable genes 
using the highly _variable genes module in scanpy (min_mean = 0.1, 
max_mean = 3, min_disp = 0.8) for clustering analysis. 


snRNA-seq data clustering 

We first regressed out the number of UMIs and the number of genes. Each 
gene was then scaled to unit variance. We then conducted dimensionality 
reduction via PCA using the ARPACK SVD solver in scanpy, computed 
the k-nearest neighbour graph with PCs 1-40 and k=30 nearest neigh- 
bours. Clusters were determined with unsupervised clustering using the 
Louvain algorithm®” and resulted in 24 clusters. Differential expression 
analysis was conducted to find the top 100 genes enriched in each cluster 
withthe rank_genes_ groups module in scanpy using logistic regression®. 
We annotated clusters post hoc on the basis of known marker genes”? 
among the top 100 enriched genes. For visualization, we embedded the 
profiles with UMAP (uniform manifold approximation and projection”). 


Cell-type-specific differential expression analysis 

To find DEGs between ABX and control mice for each cluster, we used 
statsmodels in Python to implement a mixed linear model for each clus- 
terc. Specifically, we used the regression Y,,.~ 7+ N+ (1|B), in which Y,,is 
the In(X + 1) expression vector for gene iacross all nuclei in cluster c, T 
is a binary variable reflecting membership of the nucleus in either ABX 
or control sample, N is the number of genes detected in each nucleus, 
and Bis a categorical variable denoting the 10x channel used for each 
sample to control for batch effects. We used a Bonferroni-corrected 
Pvalue of 10” as the cut-off for significance. For plotting, we used DEGs 
that had a minimum log,(fold change) of 0.31 (absolute fold change 
of 1.24) in either direction, and independently found to be significant 
in at least two clusters. To rank clusters based on the number of DEGs 
(Extended Data Fig. 5b), we first randomly sampled 500 nuclei with 
replacement from each cluster to maintain comparable statistical power 
across clusters and re-ran the mixed linear model as described; all other 
plots and analyses, including GO enrichment, were based on the full list 
of DEGs obtained without downsampling. 


Vagotomy 

The following subdiaphragmatic vagotomies and pyloromyotomy pro- 
cedure were modified from previously described procedures”. Mice 
were anaesthetized via IP injection of a ketamine (144 mg/kg)/xylazine 
(13 mg/kg) cocktail. A midline incision was made and the stomach was 
retracted inferiorly to expose the distal oesophagus and the gastroe- 
sophageal junction. The anterior (left) and posterior (right) branches 
of the vagus nerve were identified running alongside the oesophagus 
and severed distal to the hepatic branches. The stomach was then placed 
back into the anatomical position and a pyloromyotomy was performed 
using a bent 23-gauge needle. The superficial muscular layers were 
incised ina longitudinal fashion and closed transversely with 4-0 vicryl 
sutures. The peritoneum was then closed with arunning 4-0 vicryl suture 
and the skin approximated with staples. Mice were allowed to recover 
from anaesthesia under a heat lamp and returned to the colony room 
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once awake and ambulating. For non-vagotomized mice, the vagus nerve 
was gently exposed without further manipulation. Mice were monitored 
for seven days. The completeness of subdiaphragmatic vagotomy was 
verified by examining fluorescent label of the dorsal motor vagal nucleus 
(DMV) on brainstem sections one week after intraperitoneal injection 
of FluoroGold. The absence of fluorescent label in DMV neurons was 
accepted as a marker of complete vagotomy. 


Mass spectrometry 

High-resolution LC-MS analysis was performed ona Dionex 3000 UPLC 
coupled witha Thermo Q-exactive high-resolution mass spectrometer 
equipped with a HESI ion source. Metabolites were separated using a 
water—acetonitrile gradient ona Agilent Zorbax Eclipse XDB-C18 column 
(150 mm x 2.1mm, particle size 1.8 1m) maintained at 40 °C; solvent A: 
0.1% formic acid in water; solvent B: 0.1% formic acid in acetonitrile. 
The A-B gradient started at 1% B for 1 min after injection and increased 
linearly to 100% B at 15 min, using a flow rate of 0.5 ml/min. Mass spec- 
trometer parameters: spray voltage 2.9 kV, capillary temperature 320 °C, 
prober heater temperature 300 °C; sheath, auxiliary, and spare gas 70, 
2, and O ml/min, respectively; S-lens RF level 55, resolution 140,000 at 
m/z200, AGC target 1 x 10°. The instrument was calibrated weekly with 
positive and negative ion calibration solutions (Thermo-Fisher). Each 
sample was analysed in negative and positive modes using a m/z range 
of 100 to1,500. 


Feature detection, characterization and compound synthesis 
LC-MS RAW files from triplicate faecal, serum and CSF samples from 
adult ex-GF_fostered, Ctrl fostered and GF mice were converted to 
mzxXML (profile mode) using MSConvert (ProteoWizard), followed 
by analysis using a customized XCMS R-script based on the centWave 
XCMS algorithm to extract features*°. Resulting tables of all detected 
features were used to compute ex-GF_fostered mice versus GF mice and 
Ctrl_fostered mice versus GF mice peak area ratios. To select differential 
features, we applied a filter that retained entries with peak area ratios 
larger than 2 (down in GF mice) or smaller than 0.5 (up in GF mice). We 
manually curated the resulting list to remove false positive entries—that 
is, features that upon manual inspection of raw data were not differential. 
For the features that were verified to be differential, we examined elu- 
tion profiles, isotope patterns, and MS1 spectra to find molecular ions 
and remove adducts, fragments, and isotope peaks. 

The structures of the four differential compounds were confirmed 
by coinjection with synthesized or commercial samples. Phenyl sulfate 
and indoxyl sulfate were purchased from TCI America and Sigma- 
Aldrich, respectively. Pyrocatechol sulfate and 3-(3-sulfooxyphe- 
nyl)propanoic acid were prepared following a previously published 
procedure”. To a stirred solution of catechol (Sigma-Aldrich, 0.55 g, 
5 mmol) or 3-(4-hydroxyphenyl)propionic acid (Sigma-Aldrich, 0.88 g 
5mmol) in dry pyridine (2.5 ml), sulfur trioxide pyridine complex (0.88 
g, 6 mmol) was added at room temperature. The resulting mixtures 
were heated in an oil bath at 45 °C and stirred for 2 h. The reactions 
were then allowed to cool to ambient temperature and transferred 
separately to flasks each containing 25 ml of 1 N KOH cooled in anice 
bath. To each of the aqueous mixtures was added 100 ml of 2-pro- 
panol and the two reactions were left at 4 °C for 16h. At this point, the 
products were filtered off as white precipitates. The crude products 
were taken up in 50 ml (3:1 ethanol:water) and heated to reflux, hot 
filtered, and placed in the fridge for recrystallization. This last step 
was then repeated. Totals of 210 and 270 mg of pyrocatechol sulfate 
and 3-(3-sulfooxyphenyl) propanoic acid were obtained, correspond- 
ing to yields of about 20%. 


Statistical analysis 

Statistical tests were performed with Prism (GraphPad). Unless specifi- 
cally indicated otherwise, Student’s t-tests were used to compare end- 
point means of different groups. Error bars depict the s.e.m. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Code availability 


The algorithm used for automated scoring of freezing behaviour is 
available at https://www.seas.upenn.edu/-~molneuro/software.html. 
The algorithm used for motion artefact correction in 2P calcium imaging 
data is available at http://www.cs.cmu.edu/~kangli/code/Image_Stabi- 
lizer.html. All other analysis code is available from the corresponding 
author upon reasonable request. 


Data availability 


RNA-seq data, 16S rRNA-seq data and snRNA-seq data are available at 
Gene Expression Omnibus and BioProject under accession numbers 
GSE134808, PRJNA556230 and GSE135326, respectively. All datasets 
generated and/or analysed during the current study are presented in 
this published article, the accompanying Source Data or Supplemen- 
tary Information, or are available from the corresponding author upon 
reasonable request. 
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Extended Data Fig. 1| Antibiotic treatment results in bacterial community 
restructuring. a—c, Food intake (a), water intake (b) and weight gain (c) of the 
mice measured using the Promethion Metabolic Cage System. Antibiotic 
treatment was started two weeks before the experiment and continued for the 
duration of the experiment. For food (a) and water intake (b), the mice were 
acclimated to the system for the first four days followed by one day of data 
collection. Body mass (c) of the mice was measured at the beginning (Start) and 
the end (End) of the 5-day experiment. n=4 mice per group. Mean +s.e.m. Total, 
full day. Light and Dark denote the light and dark periods of the 12-hcycle.d,16S 
rDNA gene copies as quantified by real-time PCR with reverse transcription 
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(RT-PCR) from stool pellets collected from control or ABX mice. Data pooled 
from two independent experiments. n=7 mice per group. Mean +s.e.m.; 
unpaired two-sided t-test. e-g, PCOA (e), alpha-diversity Shannon index (f) and 
taxonomic classification (g) of 16S rDNA in stool pellets collected from control 
or ABX mice. Control n=4, ABX n=5. For PCoA plot PERMANOVA: F= 33.579, 
Df=1, P=0.00804. For phylogenetic classification ‘f_’, ‘g_’,‘uncl_c_’, ‘uncl_d_’and 
‘uncl_o_’ stand for ‘family _’, ‘genus ’, ‘unclassified_class_’, ‘unclassified_domain_’ 
and ‘unclassified_order_’, respectively. ‘uncl_d_Bacteria’ matches exactly to 
mitochondria or chloroplasts, probably from the food. Mean +s.e.m. inf. 
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Extended Data Fig. 2| Antibiotic-treated mice retain deficits in extinction 
learning after vagotomy. Fear extinction in sham-operated control (Ctrl_Sham) 
or ABX mice (ABX_Sham) and in vagotomised ABX mice (ABX_Vx) mice over the 
course of 3 days or sessions. Ctrl Sham n=10, ABX_Shamn=10, ABX_Vxn=12. 
Mean +s.e.m.; AUC was calculated for each mouse within each group, followed 
by unpaired two-sided t-test between groups. Pvalues are as follows: i, 2.57 x 107; 
ii,9.21x10°. 
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Extended Data Fig. 3 | Comparable percentages and numbers of CD45" 
leukocytes in the brains of control and ABX or GF mice. a, Gating strategy for 
Tcells, Bcells, dendritic cells (DCs) and macrophages (MQ) inthe brain. 

b, Population frequencies and numbers of brain-resident CD45"®" leukocytes in 
control and ABX mice. c, d, Population frequencies of CD4* T cells, CD8* T cells, 
CD19*B cells (c), CD11c* DCs and F4/80* macrophages (d) gated on brain- 
resident CD45"8" leukocytes in control and ABX mice. e, Population frequencies 
and numbers of brain-resident CD45"" leukocytes in control and GF mice. 

f, g, Population frequencies of CD4* T cells, CD8* T cells, CD19" B cells (f), CD11c* 
DCs and F4/80* macrophages (g) gated on brain-resident CD45"*" leukocytes in 
control and GF mice. h, Gating strategy of total myeloid cells and Ly6c"™#" 
monocytes in the brain. i,j, Population frequencies of total myeloid cells and 
Ly6C"8" monocytes gated on brain-resident CD45"*" leukocytes in control and 
ABX (i) or GF (j) mice. Datainb, c, g,j are representative of three independent 
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ii=0.0004.1, Fear extinction of SPF-RagI” and GF-Rag1“ mice in the single- 
session 30-tone fear extinction assay. n=7 mice per group. Mean+s.e.m.; AUC 
was calculated for each mouse within each group followed by unpaired two- 
sided t-test between groups. Pvalue is shown. 


eo at 


PC2: 3% variance 


-10 0 10 
PC1: 92% variance 


anil 


oO 


foo} 


o>) 


bh 


Le) 


fo} 


Ctrl GF 


IF: c-Fos Ctrl 


Extended Data Fig. 4 | Comparable transcriptomes of mPFCs dissected from 
control and ABX mice in the absence of fear conditioning and extinction. a, 
PCA of genome-wide transcriptional profiles of mouse mPFC in the absence of 
fear conditioning and extinction. Control n=3, ABXn=4.PERMANOVA test was 
used: F=2.52, Df=1, P=0.17. b, Volcano plot of differential expression between 
control (negative log,FC) and ABX (positive log,FC) groups. DEGs (defined as 
FDR<0.1, DESeq2 Wald test) are shown in red. c-f, Immunofluorescence 
staining of c-FOS (red) (c, e) and the density of c-FOS* neurons (d, f) inthe BLA 
(c,d) or IL (e, f) of control and GF mice 90 min after classical fear extinction 
session 3. Data pooled from two independent experiments. n=6 mice per group. 
Mean +s.e.m.; unpaired two-sided t-tests. Pvalues are shown. Scale bar, 200 pm. 
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Extended Data Fig. 5 | Gene expression patterns of individual cell subsetsin 
the mPFC. a, Proportion of expressing cells (dot size) and mean normalized 
expression of representative marker genes (columns) associated with the cell 
clusters shown in Fig. 2a (rows). Clusters are labelled with post facto annotation 
based on known marker genes. Ambiguous clusters expressing multiple 
canonical markers across cell types are annotated with both (for example, 
exPFC/astrocyte), and are likely to represent doublets. b, Number of 


significantly differentially expressed genes (z-test calculated on coefficients of 
mixed linear model, Bonferroni-corrected P<10”) by cluster after 
downsampling each cluster to 500 nuclei, ranked from highest to lowest 
(clusters of doublets and undetermined annotations not included). 

exPFC, glutamatergic excitatory neurons from the PFC; GABA, y-aminobutyric 
acid (GABA)ergic interneurons; OPC, oligodendrocyte progenitor cells; 

MO, myelinating oligodendrocytes. 
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Extended Data Fig. 6 | Differential gene expression between control and 
ABX mice in individual clusters of mPFC cells. Differential expression of ABX 
versus control (log,FC) in each cluster in Fig. 2a and the associated significance. 


Blue, genes that are significantly differentially expressed (z-test calculated on 
coefficients of mixed linear model, Bonferroni-corrected P<10”). 
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Extended Data Fig. 7 | Differentially expressed genes in ABX versus control 
mPFC samples shared by all excitatory neuronal subsets. Mean fold changein 
expression in excitatory neurons (columns) from Fig. 2a of genes (rows) that 
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Extended Data Fig. 8 | Differentially expressed genes in ABX versus control 
mPFC samples shared by multiple cell types. Mean fold change in expression 
across all cell clusters (columns) from Fig. 2a of genes (rows) that were 
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significantly differentially expressed (z-test calculated on coefficients of mixed 
linear model, Bonferroni-corrected P<10”) inat least 4 clusters, and with 
absolute log,FC = 0.31in at least 1 cluster. 
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Extended Data Fig. 9 | Microgliain GF and ABX mice exhibit a 
developmentally immature phenotype. a, Population frequencies and 
numbers of microgliain control and GF mice. b, Representative flow cytometry 
histogram and mean fluorescence intensity (MFI) of F4/80 staining on microglia 
from control and GF mice. c, Representative flowcytometry plots and 
population frequencies of CSF1R* microglia in control and GF mice. 

d, Representative flow cytometry histogram and MFI of CSF1R expression gated 
on CSF1R* microglia from control and GF mice. Data ina-d are representative of 
three independent experiments, n=4 mice per group. e, Population frequencies 


Ctrl 


ABX CSF1R-PerCP-eF710 


Ctrl 


ABX 


and numbers of microglia in control and ABX mice. f, Representative flow 
cytometry histogram and MFI of F4/80 staining on microglia from control and 
ABX mice. g, Representative flow cytometry plots and population frequencies 
of CSF1R* microglia in control and ABX mice. h, Representative flow cytometry 
histogram and MFI of CSF1R expression gated on CSF1R* microglia from control 
and ABX mice. Data ine are pooled from two independent experiments, n=8 
mice per group. Datain f-h are representative of two independent experiments, 
n=4mice per group. Mean +s.e.m.; unpaired two-sided t-test. Pvalues shown. 
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Extended Data Fig. 10 | Downregulation of metabolites in GF mice. 

a, Enzyme-linked immunosorbent assay (ELISA) quantification of plasma 
corticosterone in control and ABX mice. Data pooled from three independent 
experiments. Control n=12; ABX n=11.b, ELISA quantification of plasma 
corticosterone in control and GF mice. Data pooled from three independent 
experiments. Control n=12; GF n=11.Mean+s.e.m.c, Structures of phenyl 
sulfate, pyrocatechol sulfate, 3-(3-sulfooxyphenyl)propanoic acid and indoxyl 
sulfate. d, Relative abundances of phenyl sulfate, pyrocatechol sulfate, 
3-(3-sulfooxyphenyl)propanoic acid and indoxyl sulfate in faecal samples from 
Ctrl_fostered, GF and ex-GF_fostered mice as determined by LC-MS. n=3 mice 
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per group. e, Relative abundances of phenyl sulfate, pyrocatechol sulfate, 
3-(3-sulfooxypheny!)propanoic acid and indoxyl sulfate in CSF samples from 
control and GF mice as determined by LC-MS. Data are representative of two 
independent experiments, n=8 mice per group. Mean +s.e.m.; unpaired two- 
sided t-test. Pvalues shown. f, Schematic of the microbiota-gut-brain axis in fear 
extinction learning. Our data informa model in which alterations in the 
microbiota and their metabolites influence neuronal function and learning- 
related plasticity, which may be due to altered microglia-mediated synaptic 
pruning, and subsequently regulate fear extinction behaviour. 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
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A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 
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Software and code 


Policy information about availability of computer code 


Data collection Graphic State v4.0, Stereo Investigator v9, FV31S, ZEN Black v2.6, BD FACSDiva v8.0.1, QuantStudio Real-Time 
PCR software v1.0 


Data analysis MATLAB R2015a, Prism 7, Fiji, Flowjo 10.4.0, Autotyping 15.04, Image Stabilizer Plugin for ImageJ, CASAVA v1.8.2, FLEXBAR v2.4, STAR 
v2.3.0, Rsubread, DESeq2 v1.18, vegan R package (https://CRAN.R-project.org/package=vegan), USEARCH v11, phyloseq R package, Cell 
Ranger v2.0, Scanpy v1.4.0, Louvain algorithm, Python, MSConvert, centWave XCMS algorithm 
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Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


RNA-seq data, 16S rRNA-seq data and single nucleus RNA-seq data are available at Gene Expression Omnibus and BioProject under accession number GSE134808, 
PRJNA556230 and GSE135326, respectively. 
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Data exclusions Samples with significant drift during microscopy (i.e. tracked cells or regions that went out of focus or out of frame) were excluded from 
subsequent analysis. These exclusion criteria were not pre-established though are standard in live timelapse imaging studies. 


Replication Experiments were repeated with at least two to three biologically independent for all results presented in the manuscript. If the group size 
was small (due to limited availability of reagents or mouse strains), data from replicate experiments were pooled for graphical representation. 
All replicates are biological replicates obtained from biologically independent experiments. 


Randomization We did not use randomization to assign animals to experimental groups. As whenever possible littermate controls were used, age did not 
constitute a variable (and was matched for non-littermates). 


Blinding For dendritic spine imaging data analysis, raters blinded to experimental conditions. All other animal studies were not blinded since treatment 
and experimental analysis could not be separated, blinding of the investigators was not possible. 
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system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used The antibodies are described below. All antibodies were purchased from BD, eBioscience (Thermo Fisher), Biolegend, xxx. All 
ntibodies were validated by manufacturers and in previous publications. 


fe¥) 


ntibodies for flow cytometry: 
D16/CD32 - clone 93 - purified - Biolegend - https://www.biolegend.com/en-us/products/purified-anti-mouse-cd16-32- 
ntibody-190 
D45 - clone 30-F11 -BV605 - Biolegend - https://www.biolegend.com/en-us/products/brilliant-violet-605-anti-mouse-cd45- 
ntibody-8721 
D4 - clone RM4-5 - FITC - eBioscience - https://www.thermofisher.com/antibody/product/CD4-Antibody-clone-RM4-5- 
onoclonal/11-0042-82 
8a - clone 53-6.7 - PE - biolegend - https://www.biolegend.com/en-us/products/pe-anti-mouse-cd8a-antibody-155 

9 - clone 1D3 - PerCP-Cy5.5 - eBioscience - https://www.thermofisher.com/antibody/product/CD19-Antibody-clone- 
i01D3-1D3-Monoclonal/45-0193-82 
1b - clone M1/70 - APC-eF780 - eBioscience - https://www.thermofisher.com/antibody/product/CD11b-Antibody-clone- 
1-70-Monoclonal/47-0112-82 
CD11c - clone N418 - APC-eF780 - eBioscience - https://www.thermofisher.com/antibody/product/CD11c-Antibody-clone-N418- 
onoclonal/47-0114-82 
F4/80 - clone BMB8 - APC - eBioscience - https://www.thermofisher.com/antibody/product/F4-80-Antibody-clone-BM8- 
onoclonal/17-4801-82 
Ly6G - clone 1A8-Ly6g - PE - eBioscience - https://www.thermofisher.com/antibody/product/Ly-6G-Antibody-clone-1A8-Ly6g- 


Monoclonal/12-9668-82 

Ly6C - clone HK1.4 - PE-Cy7 - eBioscience - https://www.thermofisher.com/antibody/product/Ly-6C-Antibody-clone-HK1-4- 
Monoclonal/25-5932-82 

CSF1R - clone AFS98 - PerCP-eF710 - eBioscience - https://www.thermofisher.com/antibody/product/CD115-c-fms-Antibody- 
clone-AFS98-Monoclonal/46-1152-82 

All flow antibodies were used at 1:200. 


Antibodies for immunofluorescence staining: 

c-Fos - clone 4 - Santa Cruz - https://www.scbt.com/scbt/product/c-fos-antibody-4 

Synaptophysin - clone SVP-38 - Sigma - https://www.sigmaaldrich.com/catalog/product/sigma/sab4200544?lang=en&region=US 
PSD-95 - clone 7E3-1B8 - Sigma - https://www.sigmaaldrich.com/catalog/product/mm/cp35?lang=en&region=US 

The above three antibodies were used at 1:1,000. 

Donkey anti-Rabbit IgG (H+L) - AF555 - https://www.thermofisher.com/antibody/product/Donkey-anti-Rabbit-lgG-H-L-Highly- 
Cross-Adsorbed-Secondary-Antibody-Polyclonal/A-31572 

Donkey anti-Mouse IgG (H+L) - AF555 - https://www.thermofisher.com/antibody/product/Donkey-anti-Mouse-lgG-H-L-Highly- 
Cross-Adsorbed-Secondary-Antibody-Polyclonal/A-31570 

The above two secondary antibodies were used at 1:500. 
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All antibodies information (including catalog number) could be easily found via the vendor websites. 


Validation All antibodies are commercially available and validated by the manufacturer. Vendor websites for antibodies were listed above 
and the validations can be found there. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals C57BL/6) (Jax 664), Ragi-/- (Jax 2216), Thy1-YFP-H (Jax 3782) and BALB/c (Jax 651) mice were purchased from The Jackson 
Laboratory and bred in-house. Male mice were used at 7-16 weeks of age. In individual experiments, all animals were age- 
matched. All mice were maintained under specific pathogen-free (SPF) conditions on a 12-hour light/dark cycle, and provided 
food and water ad libitum. Germ-free CS57BL/6 mice and gnotobiotic mice were maintained at Weill Cornell Medical College, 


New York. 
Wild animals No wild animals included. 
Field-collected samples No field-collected samples included. 
Ethics oversight All protocols were approved by the Weill Cornell Medicine Institutional Animal Care and Use Committees (IACUC), and all mice 


were used in accordance of governmental and institutional guidelines for animal welfare. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Methodology 


Sample preparation Sample preparation is described in methods in the 'Brain-resident immune cell isolation and flow cytometry’ section. 
Instrument A custom configuration Fortessa flow cytometer (BD Biosciences). 
Software FACS DIVA software (BD Biosciences) and FlowJo V10 (Tree Star). 
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About half of all bacteria carry genes for CRISPR-Cas adaptive immune systems’, which 
provide immunological memory by inserting short DNA sequences from phage and 
other parasitic DNA elements into CRISPR loci on the host genome’. Whereas CRISPR 
loci evolve rapidly in natural environments*“, bacterial species typically evolve phage 
resistance by the mutation or loss of phage receptors under laboratory conditions**. 
Here we report how this discrepancy may in part be explained by differences in the 
biotic complexity of in vitro and natural environments”. Specifically, by using the 
opportunistic pathogen Pseudomonas aeruginosa and its phage DMS3vir, we show that 
coexistence with other human pathogens amplifies the fitness trade-offs associated 
with the mutation of phage receptors, and therefore tips the balance in favour of the 
evolution of CRISPR-based resistance. We also demonstrate that this has important 
knock-on effects for the virulence of P. aeruginosa, which became attenuated only ifthe 
bacteria evolved surface-based resistance. Our data reveal that the biotic complexity of 
microbial communities in natural environments is an important driver of the evolution 


of CRISPR-Cas adaptive immunity, with key implications for bacterial fitness and 


virulence. 


P. aeruginosa is a widespread opportunistic pathogen that thrives in 
a range of different environments, including hospitals, where it is a 
common source of nosocomial infections. In particular, it frequently 
colonizes the lungs of patients with cystic fibrosis, in whom it is the 
leading cause of morbidity and mortality’. In part fuelled by arenewed 
interest in the therapeutic use of bacteriophages as antimicrobials 
(phage therapy)"°”, many studies have examined whether and how P. 
aeruginosa evolves resistance to phage (reviewed in ref. ”). The clinical 
isolate P. aeruginosa strain PA14 has been reported to predominantly 
evolve resistance against its phage DMS3vir by the modification or 
complete loss of the phage receptor (type IV pilus) when grown in 
nutrient-rich medium’, despite carrying an active CRISPR-Cas adap- 
tive immune system. By contrast, under nutrient-limited conditions, 
the same strain relies on CRISPR-Cas to acquire phage resistance’. 
These differences are due to higher phage densities during infections 
in nutrient-rich compared with nutrient-limited conditions, whichinturn 
determines whether surface-based resistance (witha fixed cost of resist- 
ance) or CRISPR-based resistance (infection-induced cost) is favoured 
by natural selection*. Although these observations suggest abiotic 
factors are crucial determinants of the evolution of phage resistance 
strategies, the role of biotic factors has remained unclear, even though 
P.aeruginosa commonly co-exists with a range of other bacterial species 
in both natural and clinical settings’. We proposed that the presence 
ofa bacterial community could drive increased levels of CRISPR-based 
resistance evolution for two main reasons. First, reduced densities of 
P. aeruginosa inthe presence of competitors may limit phage amplifica- 
tion, and favour CRISPR-based resistance’. Second, pleiotropic costs 


associated with the mutation of phage receptors may be amplified dur- 
ing interspecific competition. 


Bacterial biodiversity drives CRISPR evolution 


To explore these hypotheses, we co-cultured P. aeruginosa PA14 with 
three other clinically relevant opportunistic pathogens that can co- 
infect with P. aeruginosa, namely Staphylococcus aureus, Burkholderia 
cenocepacia and Acinetobacter baumannii* ”, none of which can be 
infected by or interact with phage DMS3vir (Extended Data Fig. 1). We 
applied a‘mark-recapture’ approach using a P. aeruginosa PA14 mutant 
carrying streptomycin resistance to monitor the bacterial population 
dynamics and evolution of phage resistance in the focal subpopula- 
tion at 3 days post infection (d.p.i.). This revealed that in nutrient-rich 
lysogeny broth, PA14 evolved significantly higher levels of CRISPR-based 
resistance after infection with 10° plaque-forming units (p.f.u.) of phage 
DMS3vir when co-cultured with other bacterial species than when grown 
in isolation or co-cultured with an isogenic surface mutant (Fig. 1a). In 
addition, we found that these effects were dependent on the identity of 
the species that were present in the mixed culture, with the strongest 
effects being observed in the presence of A. baumannii or a mix of the 
three bacterial species, and an absence of any effect when PA14 was co- 
cultured with an isogenic surface mutant that lacked the phage receptor 
(Fig. 1a, deviance test: relationship between community composition 
and CRISPR; residual deviance (30, n = 36) =1.81, P= 2.2 x 10”; Tukey 
contrasts: monoculture versus mixed; z=-—5.99, P=3.02 x 10-8; mono- 
culture versus A. baumannii; z=—4.33, P=0.00023; monoculture versus 
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Fig. 1| Biodiversity affects the evolution of phage resistance. a, Proportion of 
P. aeruginosa that acquired surface modification (SM) or CRISPR-based 
resistance, or remained sensitive at 3 d.p.i. with phage DMS3vir when grown in 
monoculture or polycultures, or with an isogenic surface mutant (6 replicates 


B. cenocepacia; z= —3.76, P= 0.0026; monoculture versus S. aureus; 
z=-2.38, P=0.26; monoculture versus surface mutant; z= 2.26, P=0.35). 
Notably, P. aeruginosa densities were strongly reduced in the presence 
of A. baumannii, B. cenocepacia and the mixed community, whereas 
P. aeruginosa dominated the community during competition with 
S. aureus despite the presence of phage DMS3vir (Fig. 1b), which suggests 
a positive relationship between the strength of interspecific competition 
and the levels of CRISPR-based resistance evolution. 

Next, to explore the potential clinical relevance of this observation, 
we performed a similar experiment in artificial sputum medium (ASM), 
which is a nutrient-rich medium that mimics the abiotic environment 
of sputum from patients with cystic fibrosis'*. This revealed a similar 
pattern as that observed in lysogeny broth, with A. baumannii and the 
community as a whole resulting in a marked increase in CRISPR-based 
resistance (Extended Data Fig. 2). To explore the generality of these 
findings further, we also manipulated the composition of the microbial 
community by varying the proportion of P. aeruginosa versus the other 
pathogens. This revealed that increased CRISPR-based resistance evolu- 
tion occurred across a wide range of microbial community compositions, 
with a maximum effect size when P. aeruginosa made up 50% of the initial 
mixture (Extended Data Fig. 3). An exception to this trend was when the 
P. aeruginosa subpopulation made up only 1% of the total community; 
in this case, sensitive bacteria persisted alongside resistant bacteria 
because of the reduced size of the phage epidemic and hence relaxed 
selection for resistance (Extended Data Fig. 3). Collectively, these data 
suggest that greater levels of interspecific competition contribute to 
the evolution of CRISPR-based resistance. 


Biodiversity amplifies costs of surface resistance 


We hypothesized that reduced population sizes of P. aeruginosa in 
the presence of competitors might explain the increased evolution 
of CRISPR-based resistance, as this leads to smaller phage epidemics, 
whichis known to favour CRISPR-based over surface-based resistance’. 
However, variation in the force of infection did not seem to havea strong 
role inthe observed effects, because even though phage epidemic sizes 
varied depending on the microbial community composition (Extended 
Data Fig. 4), this did not correlate with the levels of evolved CRISPR resist- 
ance (Extended Data Fig. 5). Moreover, when manipulating the starting 
titres of the DMS3vir phage, we observed no differences in the levels of 
evolved CRISPR-based resistance when P. aeruginosa was co-cultured 
inthe presence of the microbial community (Extended Data Fig. 6). An 
alternative explanation for the observed effects may therefore be that 
the fitness cost of surface-based resistance is amplified inthe presence 
of other bacterial species; for example, owing to cell-surface molecules 
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per treatment, with 24 colonies per replicate, n = 36 biologically independent 
replicates). Data are mean +s.e.m. b, Microbial community composition over 
time for the mixed-species infection experiments. AB, A. baumannii; BC, B. 
cenocepacia; PA14, P. aeruginosa; SA, S. aureus. 


playing a part in interspecific competition”, which again would result 
in stronger selection towards bacteria with CRISPR-based resistance. 
To test this hypothesis, we competed the two phage-resistant pheno- 
types (that is, CRISPR-resistant and surface mutant) in the presence or 
absence of the microbial community, and across a range of phage titres. 
Inthe absence of the microbial community and phage, CRISPR-resistant 
bacteria had a small fitness advantage over bacteria with surface-based 
resistance, but this advantage disappeared when phage was added and 
as titres increased’ (Fig. 2a). In the presence of the biodiverse microbial 
community, however, the relative fitness of bacteria with CRISPR-based 
resistance was consistently higher, which demonstrates that mutation 
of the type IV pilus is more costly when bacteria compete with other 
bacterial species (Fig. 2a, linear model: effect of community absence; 
t=-5.54, P=1.49 x10’; effect of increasing phage titre; t=—2.41, P=0.017; 
overall model fit; adjusted R? = 0.41, Fy. 35 = 25.48, P=7.65 x 10°"). The 
increased fitness trade-off associated with surface-based resistance 
was also observed when the CRISPR- and surface-resistant phenotypes 
competed inthe presence of only a single additional species (Fig. 2b, two- 
way ANOVA with Tukey contrasts: overall difference in fitness; F,,.=8.151 
P=6.31x 10°; monoculture versus mixed; P= 0.011; monoculture versus 
A. baumannii; P= 0.016; monoculture versus B. cenocepacia; P=0.022), 
with the exception of S. aureus (Fig. 2d, monoculture versus S. aureus; 
P=0.80), concordant with this species being the weakest competitor 
and inducing the lowest levels of CRISPR-based resistance (Fig. 1). These 
fitness trade-offs therefore explain why P. aeruginosa evolved greater 
levels of CRISPR-based resistance in the presence of the other patho- 
gens, and why this varied depending on the competing species (Fig. 1). 


CRISPR-resistant P. aeruginosa remains virulent 


The evolution of phage resistance by bacterial pathogens is often associ- 
ated with virulence trade-offs when surface structures are modified”, 
whereas similar trade-offs have not yet been reported for CRISPR-based 
resistance. We therefore hypothesized that the community context in 
which phage resistance evolves may have important knock-on effects 
for P. aeruginosa virulence. To test this, we used a Galleria mellonella 
infection model, which is commonly used to evaluate the virulence of 
human pathogens”, We compared the in vivo virulence of P. aeruginosa 
clones that evolved phage resistance against phage DMS3vir in different 
community contexts by injecting larvae with a mixture of clones that 
had evolved phage resistance in either the presence or absence of the 
mixed bacterial community (Extended Data Fig. 3c). By taking the time 
until death asa proxy for virulence, we found that the evolution of phage 
resistance in the presence of a microbial community was associated with 
greater levels of P. aeruginosa virulence than when phage-resistance 
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Fig. 2| Biodiversity amplifies fitness costs associated with surface-based 
resistance. a, Relative fitness of a P. aeruginosa clone with CRISPR-based 
resistance after competing for 24 h against a surface-modification clone at 
varying titres of phage DMS3vir in the presence or absence ofa mixed microbial 
community. Regression slopes with shaded areas corresponding to 95% 
confidence interval (n=144 biologically independent samples). b, Relative 
fitness after competition in the absence of phage, but in the presence of other 
bacterial species individually or as a mixture. Data are mean and 95% confidence 
intervals (n=144 biologically independent samples). 


evolved in monoculture, and remained similar to that of the ancestral 
PA14 strain (Fig. 3a, Cox proportional-hazards model with Tukey con- 
trasts: community present versus absent; z= 5.85, P=1 10+; ancestral 
PA14 versus community absent; z=4.42, P=1x 107; ancestral PA14 versus 
community present; z= -1.30, P= 0.38; overall model fit; LRT, = 51.03, 
n=376, P=5x10™). These data, in combination with the fact that the 
type IV pilus is a well-known virulence factor”, are consistent with the 
notion that the mechanism by which bacteria evolve phage resistance 
has important implications for bacterial virulence. To test this more 
directly, we next infected larvae with each individual P. aeruginosa clone 
for which we had previously determined the mechanism that underlies 
evolved phage resistance (Extended Data Fig. 3c), again using the time 
until death as a measure of virulence. This showed that bacterial clones 
with surface-based resistance—unlike those with CRISPR-based resist- 
ance—both had markedly reduced swarming motility (as expected for 
mutations in the type IV pilus”’) (Fig. 3b; one-way ANOVA with Tukey 
contrasts: overall effect; F,5,,= 472.5, P=2.2 x 10; sensitive versus 
CRISPR; P= 0.87; CRISPR versus surface mutant; P=1 10>) and impaired 
virulence compared with phage-sensitive bacteria (Fig. 3c; Cox pro- 
portional-hazards model with Tukey contrasts: surface mutant versus 
CRISPR; z= -2.37, P= 0.045; sensitive versus CRISPR; z= 2.10, P= 0.10; 
surface mutant versus sensitive; z=—4.23, P=1x 10°; overall model fit; 
LRT,=48.66,n=981, P=2 x10). Similar virulence trade-offs were also 
observed when larvae were injected with P. aeruginosa PA14 clones that 
had evolved surface-based resistance against phage LMA2, which uses 
lipopolysaccharide (LPS) as a receptor (Extended Data Fig. 7). 


Discussion 


We have shown that the evolutionary outcome of bacteria—phage inter- 
actions can be fundamentally altered by the microbial community con- 
text. Although conventionally studied in isolation, these interactions are 
usually embedded in complex biotic networks of interacting species, 
and it is becoming increasingly clear that this can have key implications 
for the evolutionary epidemiology of infectious disease“ **. Our work 
shows that the community context can shape the evolution of different 
host-resistance strategies. Specifically, we find that the interspecific 
interactions between four bacterial species in a synthetic microbial 
community can have alarge effect on the evolution of phage-resistance 
mechanisms by amplifying the constitutive fitness cost of surface-based 
resistance’. The finding that biotic complexity matters complements 
previous work on the effect of abiotic variables and force of infection 
onthe evolution of phage resistance’. The data presented here suggest 
that the effect of biotic complexity on the evolution of CRISPR-based 
resistance is stronger than that of variation in phage abundance, whichis 
consistent with the observation that in the presence of the polymicrobial 
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Fig. 3| Evolution of phage resistance affects in vivo virulence. a, Time until 
death (given as the median + one standard error) after infection with PA14 clones 
that evolved phage resistance in either the presence or the absence of amixed 
microbial community (n= 376 biologically independent samples, analysed using 
a Cox proportional-hazards model with Tukey contrasts). LT;,, median lethal 
time. b, The effect of the type of evolved phage resistance (CRISPR-based or 


surface-modification-based) on bacterial motility (n= 981 biologically 
independent samples). Box plots show the median with the upper and lower 
twenty-fifth and seventy-fifth percentiles, the interquartile range, and outliers 
shownas dots. c, The effect of the type of resistance on in vivo virulence (time 
until death, given as the median + one standard error; n= 981, analysed using a 
Cox proportional-hazards model with Tukey contrasts). 
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community, bacteria with CRISPR-based resistance outcompeted bacte- 
ria with surface-based resistance at all phage titres (Fig. 2). The amplified 
fitness cost of surface mutation also suggests that the type IV pilus has 
an important role in interspecific competition. Although future work 
will be crucial to understand the detailed molecular mechanism that 
underpins these effects, and to generalize the findings described here 
to other bacterial species and strains, we speculate that the way in which 
the composition of the microbial community drives the evolution of 
phage-resistance strategies may be important in the context of phage 
therapy. Primarily, the absence of detectable trade-offs between CRISPR- 
based resistance and virulence, as opposed to when bacteria evolve 
surface-based resistance, suggests that the evolution of CRISPR-based 
resistance can ultimately influence the severity of disease. Moreover, 
the evolution of CRISPR-based resistance can drive more rapid phage 
extinction”, and may ina multi-phage environment result in altered 
patterns of cross-resistance evolution compared with surface-based 
resistance*’. The identification of the drivers and consequences of 
CRISPR-resistance evolution might help to improve our ability to pre- 
dict and manipulate the outcome of bacteria—phage interactions in 
both natural and clinical settings. 
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Methods 


All statistical analyses were done using R v.3.5.1°°, and the Tidyverse 
package v.1.2.1**. All G. mellonella mortality analyses were done using 
the Survival package v.2.38*. No statistical methods were used to pre- 
determine sample size. The experiments were not randomized, and 
investigators were not blinded to allocation during experiments and 
outcome assessment. 


Bacterial strains and viruses 

We used a marked P. aeruginosa UCBPP-PA14 mutant carrying a 
streptomycin-resistant gene inserted into the genome using pBAM1™ 
(referred to as the ancestral PA14 strain). The wild-type PA14 bacterio- 
phage-insensitive mutant with two CRISPR spacers (BIM2), the surface 
mutant derived from the PAI4 csy3.:LacZ strain, and phage DMS3vir 
and DMS3virtacrF1 (carrying an anti-CRISPR gene) have all been previ- 
ously described (refs. >”? and references therein). The bacteria used as 
the microbial community were S. aureus strain 13 S44 S9, A. baumannii 
clinical isolate FZ21 and B. cenocepaciaJ2315, and were allisolated from 
patients at Queen Astrid Military Hospital, Brussels, Belgium. 


Adsorption and infection assays 

Phage infectivity against each of the bacterial species used in this study 
was assessed by spotting serial dilutions of virus DMS3vir on lawns of 
the individual community bacteria, followed by checking for any plaque 
formation after 24 h of growth at 37 °C. Adsorption assays (as shownin 
Extended Data Fig. 1) were performed by monitoring phage titres over 
time, for up to an hour (at 0, 2, 4, 6, 8, 10, 15 and 20 min after infection 
for PA14, and at O, 5, 10, 20, 40 and 60 min after infection for the other 
bacteria species; for the no-bacterial control, sampling was done at 0 
and 60 min after infection), after inoculating the individual bacteria 
in mid-log phase at approximately 2 x 10° c.f.u. with phage DMS3vir at 
2x10° p.f.u. (final multiplicity of infection = 0.001). Adsorption assays 
were carried out in falcon tubes containing 15 ml LB medium, incubated 
at 37 °C while shaking at 180 r.p.m. (three independent replicates per 
experiment). At each time point, 50 pl of sample was transferred to 
pre-cooled Eppendorfs on ice, containing 900 pl LB medium and 50 pl 
chloroform, before vortexing for 10 s. After sampling was completed, 
all eppendorfs were centrifuged at 13,000 r.p.m. at 4 °C for at least 5 
min after which 300 pl of the supernatant was extracted, diluted and 
spotted onto lawns of P. aeruginosa followed by checking for plaque 
formation after 24 h of growth at 37 °C. 


Evolution experiments 

The streptomycin-resistant mutant of the ancestral strain of P. aerugi- 
nosa was used for all evolution experiments. Evolution experiments 
(shownin Fig. land Extended Data Figs. 2,3) were performed by inoculat- 
ing 60 ul from overnight cultures (containing approximately 10° c.f.u.) 
into glass microcosms containing 6 ml LB medium (Fig. land Extended 
Data Fig. 3), or ASM’ (Extended Data Fig. 2). One litre of ASM was made 
by mixing 5 g mucin from porcine stomach (Sigma), 4 g low molecular- 
mass salmon sperm DNA (Sigma), 5.9 mg diethylene triamine pentaacetic 
acid (DTPA) (Sigma), 5 g NaCl (Sigma), 2.2 g KCI (Sigma), 1.81 g Tris base 
(Thermo Fisher Scientific), 5 ml egg yolk emulsion (Sigma) and 250 mg 
of each of 20 amino acids (Sigma), as previously described”. Inoculation 
was followed by incubation at 37 °C while shaking at 180 r.p.m. (n=6 per 
treatment). The polyculture mixes either consisted of approximately 
equal amounts ofall four bacterial species or mixes of P. aeruginosa with 
just one additional species where P. aeruginosa made up 25% of the total 
volume used for inoculation (that is, 15 pl of 60 pl), unless otherwise 
indicated (Extended Data Fig. 3). Before inoculation, phage DMS3vir 
was added at 10° p.f.u. (Fig. 1 and Extended Data Fig. 2), or at 10‘ p.f.u. 
(Extended Data Fig. 3). Transfers of 1:100 into fresh broth were done 
daily for a total of three days. In addition, phage titres were monitored 
daily by spotting chlorophorm-treated lysate dilutions on a lawn of 


P. aeruginosa csy3::LacZ. Downstream analysis to determine whether and 
how bacteria evolved phage resistance was done by cross-streak assays 
and PCR on 24 randomly selected clones per replicate experiment, as 
previously described>. 


DNA extraction and qPCR 

For the experiment shown in Fig. 1, the densities of the different bacterial 
species in the microbial communities over time were determined using 
qPCR. DNA was extracted from all replicas using the DNeasy UltraClean 
Microbial Kit (Qiagen), following the manufacturer’s instructions. Before 
DNA extraction, to ensure lysis of S. aureus, 15 pl lysostaphin (Sigma) at 
0.1mg mI“ was added to 500 pl of sample followed by incubation at 37 °C 
for at least 1h. For P. aeruginosa, A. baumanniiand B. cenocepacia, the 16S 
gene was chosenas the target for the qPCR primers and were as follows: 
PA14 forward primer (PA14-16 s-F), AGT TGGGAGGAAGGGCAGTA; PA14 
reverse primer (PA14-16 s-R), GCTTGCTGAACCACTTACGC; A. baumannii 
forward primer (AB-16 s-F), ATCAGAATGCCGCGGTGAAT; A. baumannii 
reverse primer (AB-16 s-R), ACCGCCCTCTTTGCAGT TAG; B. cenocepacia 
forward primer (BC-16 s-F), ATACAGTCGGGGGATGACGG; B. cenoce- 
pacia reverse primer (BC-16 s-R), TCACCAATGCAGT TCCCAGG. For 
S. aureus, we used qPCR primers that have previously been described”. 
The amplification reactions were performed in triplicates, with Brilliant 
SYBR Green reagents (Agilent) in 20 pl reactions made up of 10 pl mas- 
ter mix, 2 pl primer pair, 0.4 pl dye, and sterile nuclease free water toa 
total volume of 15 pl before adding 5 pl diluted DNA sample. The qPCR 
program was as follows: 95 °C for 3 min, 40 cycles at 95 °C for 10 s and 
60 °C for 30s. All qPCR reactions and results were analysed using the 
Applied Biosystems QuantStudio 7 Flex Real-Time PCR system. 


Competition experiments 

For both competition experiments in Fig. 2, the BIM2 clone was com- 
peted against the surface mutant derived from the PA14 csy3::LacZ 
strain®. Bacteria were grown for 24 hin glass microcosms containing 
6 mI LB medium, ina shaking incubator at 180 r.p.m. and at 37 °C. For 
the experiment in Fig. 2a, the two phenotypes were competed in the 
presence or absence of the mixed microbial community, either with- 
out the addition of phage (n = 36), or infected with phage DMS3vir at 
10*, 10° and 108 p.f.u. (n= 12 per treatment). For the experiment shown 
in Fig. 2b, the two phage-resistant phenotypes were again competed 
either in the presence or absence of individual bacterial species or a 
mixed community ofall species. P. aeruginosa made up 25% of the total 
volume of 60 pl that was used to inoculate the 6 ml of LB medium (n=24 
per treatment). Samples were taken at O and 24 h after infection, and 
the cells were serial diluted in M9 salts and plated on cetrimide agar 
(Sigma) supplemented with approximately 50 pg mI X-gal (to select for 
P. aeruginosa, while also differentiating between the CRISPR-resistant 
clones (white) and the surface mutant (blue)). Relative fitness was 
calculated as previously described*”’. 


Virulence assays 

Allinfection experiments were done using G. mellonella larvae (UK Wax- 
Worms Ltd). Throughout the experiments, the larvae were stored in 
12-well plates, with one larva per well, and were all checked for mortality 
and melanization before injection. Bacterial inoculums were prepared 
depending on the experiment, and were as follows. For the experiment 
in Fig. 3a, all 24 evolved clones from each replicate from the 25% (com- 
munity present) and 100% (community absent) treatments (Extended 
Data Fig. 3) were pooled together by replica (n = 6 per treatment) and 
mixed in 6 ml of LB medium. Each mixture of clones was injected into 
ten individual larvae, with time until death measured as a proxy for 
virulence. This procedure was performed in three independent repeats 
by injecting the same mixtures of bacterial clones into independent 
batches of larvae in separate experiments (total number of larvae = 420). 
To assess virulence of all evolved clones (Fig. 3c), infections were done 
independently using all of the individual PA14 clones from 3 d.p.i. from 
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the experiment shown in Extended Data Fig. 3 (n=1,008). Here (Fig. 3c), 
the bacterial inoculums were prepared individually for each clone by 
inoculating 200 pl LB medium with 5 ul bacterial sample from freezer 
stock, repeated for all individual clones in 96-well plates. Finally, to 
measure whether surface-based resistance against a LPS-specific phage 
was associated with similar virulence trade-offs (Extended Data Fig. 7), 
we isolated P. aeruginosa clones from six independent infection experi- 
ments with phage LMA2. A total of 10 clones per replicate experiment, 
isolated from 3 d.p.i., were phenotypically characterized to confirm 
resistance, and examined by PCR to exclude that resistance was CRISPR- 
based. All 10 clones with LPS-based resistance from the same replicate 
experiment were pooled together in 6 ml of LB medium (n=6), and infec- 
tions of G. mellonella larvae were carried out as described above, with 
each mixture of clones injected into ten individual larvae, performed 
in three independent repeats (total number of larvae = 240). Before 
infection, all bacterial inoculums were grown overnight at 37 °C on an 
orbital shaker (180 r.p.m.) before being diluted by adding 20 pI to 180 pl 
of M9 salts. Cell density was then assayed by measuring optical density 
at 600 nM (OD,o), With 0.1 OD¢o, being approximately 1 10%c.f.u. mI", 
before being further diluted to approximately 1 x 10* c.f.u. mI“, which was 
subsequently used for infection by injecting 10 pl into the rear proleg of 
individual G. mellonella using a sterile syringe as previously described”. 
Optical density measurements and experimental repeats were taken 
into account during formal data analysis. After infection, larvae were 
incubated at 28 °C, with mortality monitored hourly for up to 48 h. For 
allindependent experiments, a control experiment in which larvae were 
injected with just M9 salts was included. All work conforms to ethical 
regulations regarding the use of invertebrates, with approval from The 
University of Exeter ethics committee. 


Motility assays 

Swarming motility of all evolved bacterial clones from the experiment 
shown in Extended Data Fig. 3c (n= 1,008) was assayed by using a 96-well 
microplate pin replicator to stamp the individual clones on 1% agar 
before overnight growth at 37 °C. The diameters of the individual clones 
were then taken as a measure of motility (three replicas per clone). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 
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Extended Data Fig. 1| Only P. aeruginosa adsorbs phage DMS3vir. Phage 
levels (in p.f.u. mI”) in minutes after infection of P. aeruginosa PA14 and three 
other bacterial species (n = 84 biologically independent replicates). Controls 


were carried out in the absence of bacteria. Here, the lines are regression slopes 


with shaded areas corresponding to 95% confidence intervals. Linear model: 
effect of P. aeruginosa on phage titre over time; t=-3.37, P=0.0009; S. aureus; 
t=1.63, P=0.11;A. baumannii; t=1.20, P= 0.23; B. cenocepacia; t=-0.27, P=0.79; 
overall model fit; F,.3;=4.33, adjusted R?= 0.11, P=3.17x10°. 
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Extended Data Fig. 2| Enhanced CRISPR resistance evolution in ASM. 
Proportion of P. aeruginosa that acquired surface modification or CRISPR-based 
immunity (or remained sensitive) 3 d.p.i. with phage DMS3vir when grownin 
ASM (6 replicates per treatment, with 24 colonies screened from each replicate, 
n=30 biologically independent replicates). Deviance test: relationship between 
community composition and CRISPR; residual deviance (25, n =30) =1.26, 
P=2.2x10"*; Tukey contrasts: monoculture versus mixed; z=-5.30, P=1x 107; 
monoculture versus A. baumannii: z=-5.60, P=1 10‘; monoculture versus B. 
cenocepacia; z=-2.80, P= 0.02; monoculture versus S. aureus; z=-0.76, P=0.93. 


Data are mean+s.e.m. 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Increased evolution of CRISPR-based resistance 
across arange of microbial community compositions over time. Proportion 
of P. aeruginosa that acquired surface modification or CRISPR-based immunity 
(or remained sensitive) at up to 3 d.p.i. with phage DMS3vir when grown either in 
monoculture (100%) or in polyculture mixtures consisting of the mixed 
microbial community but with varying starting percentages of P. aeruginosa 
based on volume (6 replicates for most samples, with 24 colonies per replicate, 
n=42 biologically independent replicates for a, n = 32 biologically independent 
replicates for b, and n= 42 biologically independent replicates for c). 

a, Resistance evolution at 1d.p.i. Dataare mean+s.e.m. Deviance test: 
relationship between CRISPR and P. aeruginosa starting percentage at time 
point 1; residual deviance (35, n=42) =4.42, P=0.004; 1%; z=-3.27, P=0.002; 


10%; z=1.21, P=0.23; 25%; z=1.62, P=0.11; 50%; z=2.20, P=0.034; 90%; z=2.07, 
P=0.046; 99%; z=0.47, P=0.65; 100%; z=1.47, P=0.15. b, Resistance evolution at 
2d.p.i.Data are mean + s.e.m. Deviance test: relationship between CRISPR and 
P. aeruginosa starting percentage at time point 2; residual deviance (25, 
n=32)=3.86, P=2.5110°%; 1%; z=-2.14, P=0.04; 10%; z=1.19, P=0.25; 25%; 
2Z=2.07, P=0.049; 50%; z=1.89, P= 0.07; 90%; z=1.12, P=0.27; 99%; z=1.21, 
P=0.24;100%;z=1.11, P=0.28.c, Resistance evolution at 3 d.p.i. Data are 

mean +s.e.m. Deviance test: relationship between CRISPR and P. aeruginosa 
starting percentage at time point 3; residual deviance (35, n=42) = 8.24, 
P=0.0004; 1%; z=-3.38, P= 0.002; 10%; z=2.12, P=0.04; 25%; z=2.77, P=0.009; 
50%; z=3.07, P=0.004; 90%; z=2.46, P=0.019; 99%; z=1.55, P=0.13; 100%; 

z= 0.87, P=0.39. 
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Extended Data Fig. 4 | Microbial community composition affects phage denote s.e.m. (n=171independent biological samples). Two-way ANOVA: 
epidemic size. The DMS3vir phage titres (in p.f.u. mI“) over time up to 3 d.p.i. of overall effect of P. aeruginosa starting percentage on phage titre; F, 19;= 14.84, 
P. aeruginosa grown either in monoculture (100%) or in polyculture mixtures as P=1.1x10". 


shown in Extended Data Fig. 3. Each data point represents the mean, error bars 
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Extended Data Fig. 5 | Nocorrelation between phage epidemic size and 
evolution of CRISPR resistance. The correlation between the proportion of 
evolved phage-resistant clones with CRISPR-based resistance and the phage 
epidemic sizes (in p.f.u. ml”) in the presence of other bacterial species, using 
data taken from experiments shown in Fig. 1, Extended Data Figs. 2,3cand6 
(n=137 biologically independent samples per time point). Correlations are 


separated by day, as phage titres were measured daily. Here, the lines are 
regression slopes, with shaded areas corresponding to 95% confidence 
intervals. Pearson’s product-moment correlation tests between phage titres (at 
each day after infection) and levels of CRISPR-based resistance: T=1;t,;,=—0.02, 
P=0.98, R?=-0.002; T= 2; ty3¢= 0.59, P= 0.55, R?= 0.05; T=3; t,3,=—-0.90, P= 0.37, 
R?=-0.08. 
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Extended Data Fig. 6 | Starting phage titre does not affect CRISPR evolution 
inthe presence ofa microbial community. Proportion of P. aeruginosa that 
acquired CRISPR-based resistance at 3 d.p.i. with varying starting titres of phage 
DMS3vir when grown in polyculture (6 replicates per treatment, with 24 colonies 
per replicate, n=24 biologically independent replicates). Deviance test: start 
phage and CRISPR; residual deviance (20, n= 24) =2.00, P=0.13; Tukey 
contrasts: 10? versus 10*; z=-1.52, P= 0.42; 10* versus 10°; z=—0.76, P=0.87;10° 
versus 10°; z=1.31, P= 0.56; 10 versus 10°; z=-2.24, P=0.11; 107 versus 10°; 
z=-0.99, P=0.75;10* versus 10°; z=0.56, P=0.94. Data are mean +s.e.m. 
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Extended Data Fig. 7| LPS-based phage resistance also affects in vivo 
virulence. Time until death (given as median + one standard error) for 

G. mellonella \arvae infected with PA14 clones that evolved phage resistance by 
LPS modification, compared to the phage-sensitive ancestral (n=209 
biologically independent samples). Cox proportional hazards model with Tukey 
contrasts: sensitive (ancestral) versus LPS; z= 4.81, P=1.49 x 10. overall model 
fit; LRT,=44.94, P=1x10°. 


nature research Corresponding author(s): cae 


Last updated by author(s): Sep 2, 2019 


Reporting Summary 


Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. 


Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


4 The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


— For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


| For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection No software was used for data collection 


Data analysis All data analyses were done using R version 3.5.1., and the Tidyverse package version 1.2.1.. The survival analyses were done using the 
Survival package version 2.38. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


All data used in this study is available on figshare at 10.6084/m9.figshare.9752903 (will be made public prior to publication). 


Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


x Life sciences Behavioural & social sciences [J Ecological, evolutionary & environmental sciences 


> 
jad) 
a 
e 
= 
o 
= 
o 
Wn 
© 
je’) 
= 
a 
= 
= 
O 
xe) 
(e) 
& 
= 
a 
Za) 
S 
3 
= 
je¥) 
= 
< 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Allin vitro experiments were performed in 6 independent replicates for sufficient statistical power, in line with common practice in 
experimental evolution studies. Larvae infections were performed in 10 replicates with 3 independent repeats for sufficient statistical power, 
and in line with common practice in Galleria virulence assays. All sample sizes were determined based on common practice in experimental 
evolution studies and infection assays using Galleria. 


Data exclusions No data were excluded from the analyses 


Replication We used 6 independent biological replicates per treatment. For virulence assays, there were 10 larvae per treatment, with 3 independent 
repeats. All observations were reproducible. 


> 
jad) 
a 
e 
= 
o 
= 
o 
Za) 
© 
je’) 
= 
a 
= 
= 
io 
ne) 
2) 
= 
= 
a 
Za) 
S 
3 
= 
je¥) 
= 
< 


Randomization — Not relevant to our study - single experimental manipulations of individual variables. 


Blinding Blinding not relevant - single experimental manipulations of individual variables. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
|__| Eukaryotic cell lines |__| Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 
Laboratory animals We used Galleria mellonella (wax moth) larvae. 
Wild animals The study did not involve wild animals 
Field-collected samples The study did not involve samples collected from the field 
Ethics oversight Ethics approval for the use of invertebrates was given by The University of Exeter ethics committee. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Article 


Heterogeneity in old fibroblasts is linked 
to variability in reprogramming and 
wound healing 


https://doi.org/10.1038/s41586-019-1658-5 


Received: 31 March 2016 


Accepted: 5 September 2019 


Published online: 23 October 2019 


Salah Mahmoudi"", Elena Mancini", Lucy Xu'”, Alessandra Moore*“, Fereshteh Jahanbani', 
Katja Hebestreit’, Rajini Srinivasan**, Xiyan Li’, Keerthana Devarajan’, Laurie Prélot', 

Cheen Euong Ang*°”, Yohei Shibuya*’, Bérénice A. Benayoun"””, Anne Lynn S. Chang®, 
Marius Wernig*’, Joanna Wysocka‘**, Michael T. Longaker*“, Michael P. Snyder’ & 

Anne Brunet"®* 


Age-associated chronic inflammation (inflammageing) is a central hallmark of ageing’, 
but its influence on specific cells remains largely unknown. Fibroblasts are present in 
most tissues and contribute to wound healing””. They are also the most widely used cell 
type for reprogramming to induced pluripotent stem (iPS) cells, a process that has 
implications for regenerative medicine and rejuvenation strategies*. Here we show that 
fibroblast cultures from old mice secrete inflammatory cytokines and exhibit increased 


variability in the efficiency of iPS cell reprogramming between mice. Variability 
between individuals is emerging as a feature of old age* ®, but the underlying 
mechanisms remain unknown. To identify drivers of this variability, we performed 
multi-omics profiling of fibroblast cultures from young and old mice that have different 
reprogramming efficiencies. This approach revealed that fibroblast cultures from old 
mice contain ‘activated fibroblasts’ that secrete inflammatory cytokines, and that the 
proportion of activated fibroblasts ina culture correlates with the reprogramming 
efficiency of that culture. Experiments in which conditioned medium was swapped 
between cultures showed that extrinsic factors secreted by activated fibroblasts 
underlie part of the variability between mice in reprogramming efficiency, and we have 
identified inflammatory cytokines, including TNF, as key contributors. Notably, old 
mice also exhibited variability in wound healing rate in vivo. Single-cell RNA-sequencing 
analysis identified distinct subpopulations of fibroblasts with different cytokine 
expression and signalling in the wounds of old mice with slow versus fast healing rates. 
Hence, a shift in fibroblast composition, and the ratio of inflammatory cytokines that 
they secrete, may drive the variability between mice in reprogramming in vitro and 
influence wound healing rate in vivo. This variability may reflect distinct stochastic 
ageing trajectories between individuals, and could help in developing personalized 
strategies to improve iPS cell generation and wound healing in elderly individuals. 


Several studies have investigated the effect of ageing and senescence on 
reprogramming’ ”, but asystematic evaluation of how ageing influences 
reprogramming is lacking. We examined the influence of old age onthe 
inflammatory profile of fibroblasts and their ability to reprogram to iPS 
cells (Fig. 1a). Using cytokine profiling, we compared the systemic milieu 
(plasma) and conditioned medium from primary fibroblast cultures 
from young (3 months) and old (28-29 months) mice (Fig. 1a). Plasma 
from old mice showed increased levels of pro-inflammatory cytokines 


(for example, IL-6 and TNF), anti-inflammatory cytokines (for example, 
IL-4), and chemokines and growth factors (for example, CSF1 (also known 
as MCSF)) compared to plasma from young mice (Fig. 1b, Extended Data 
Fig. 1a, b and Supplementary Table 1a). Conditioned medium from pri- 
mary fibroblast cultures from the ears of old mice also showed enhanced 
levels of pro- and anti-inflammatory cytokines (for example, IL-6 and 
TNF, and IL-4, respectively; (Fig. 1b, Extended Data Fig. 1c, d and Sup- 
plementary Table 1b). Similarly, inflammatory cytokines increased with 
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Fig.1| Primary fibroblasts from old mice secrete inflammatory cytokines 
and show increased variability in reprogramming efficiency between mice. 
a, Experimental schematic. Young mice, 3 months old; old mice, 28-29 months 
old. OSKM, OCT4, SOX2, KLF4 and MYC. b, Top, age-dependent changes in 
cytokine levels in plasma and conditioned medium from fibroblasts or iPS cells 
(Extended Data Fig. 1a, g, h). ND, not detected. Bottom, cytokine profiles of 
conditioned medium from primary cultures (passage 3) of ear fibroblasts from 
young (3 months, n= 24) and old (29 months, n= 24) male mice (3 independent 
experiments). Box-and-whisker plots of log,-transformed fold change in mean 
fluorescence intensity (MFI) compared to the median of young fibroblasts. Box 
plots depict median and interquartile range, with whiskers indicating minimum 
and maximum values. **P< 0.01, ***P< 0.001; two-tailed Wilcoxon rank-sum test 
with Benjamini-Hochberg correction. Exact P values can be foundin 
Supplementary Table 1b. c, Reprogramming efficiency assessed by alkaline 
phosphatase (AP) staining of cultures of ear fibroblasts obtained from young 

(3 months, n=44), middle-aged (12 months, n=11) and old (28-29 months, n=53) 
mice (7 independent experiments). The log,-transformed fold change over the 
median of young mice is shown. Each dot represents a fibroblast culture from 
one mouse. Pvalues, Fligner-Killeen test to assess differences in variance 
between age groups with Benjamini-Hochberg correction. 


age in conditioned medium from lung fibroblasts and human primary 
fibroblasts (Extended Data Fig. le, fand Supplementary Table Ic, d). 
Thus, primary cultures of fibroblasts from old mice exhibit asecretory 
inflammatory profile that overlaps in part with that of the systemic 
milieu (Fig. 1b and Extended Data Fig. 1h). 

To systematically test the effect of age on iPS cell reprogramming, 
we derived independent fibroblast cultures from a total of 108 young, 
middle-aged and old mice. We induced reprogramming by express- 
ing human OCT4 (also known as POUSF1), KLF4, SOX2 and MYC®, and 
assessed reprogramming efficiency using alkaline phosphatase (AP) 
and stage-specific embryonic antigen 1 (SSEA1) staining” (Fig. la and 
Extended Data Fig. 1i-l). We did not observe a significant change in mean 
reprogramming efficiency with age (Fig. 1c and Extended Data Fig. 1l). 
However, there was increased variability between mice in reprogram- 
ming efficiency with age, with cultures from some old mice reprogram- 
ming better and some worse than cultures from young mice (Fig. Ic and 
Extended Data Fig. 11). A similar age-dependent increase in variability 
inreprogramming efficiency was observed in chest fibroblast cultures 
(Extended Data Fig. 1m). Reprogramming efficiency appeared to be 
inherent to each culture (derived from an individual mouse), as the 
same culture exhibited largely consistent reprogramming efficiency to 
iPS cells between independent experiments or to induced neurons 
(Extended Data Fig. In, 0). This increased variability in reprogramming 
efficiency between fibroblast cultures from different old mice could 
reflect distinct stochastic ageing trajectories. 

Variability between old individuals has been observed for several bio- 
logical features* °. However, most studies were performed in humans, in 
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Fig. 2| Old fibroblast cultures exhibit a signature of aninflammatory 
activated state, whichis associated with variability in reprogramming 
efficiency. a, Multi-omics characterization of fibroblast cultures. ChIP-seq, 
chromatin immunoprecipitation followed by sequencing; UHPLC-MS, ultra- 
high performance liquid chromatography-tandem mass spectrometry. b, 
Principal component (PC) analysis of transcriptomes cultures of ear fibroblasts 
from young (3 months, n=8) and old (29 months, n=10) (left) or only old (right) 
mice (3 independent experiments). Old cultures were either good (high 
reprogramming efficiency) or bad (low reprogramming efficiency) 
(Supplementary Table 2a). c, Heat map of significantly differentially expressed 
genes (determined by DESeq2) between young and old fibroblasts described in 
band enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. 
Pathways are colour coded according to significance (one-sided Fisher’s exact 
test with Benjamini-Hochberg correction; black, false-discovery rate (FDR)- 
adjusted P< 0.05; grey, FDR-adjusted P< 0.15; Supplementary Table 2b, c). ECM, 
extracellular matrix. d, Summary of the multi-omics profiling of young and old 
fibroblast cultures (Extended Data Fig. 2).e, Pathway enrichment analysis of 
KEGG pathways associated with good or bad reprogramming efficiency. 
Overlapping significant (FDR-adjusted P< 0.05) KEGG pathways identifiedina 
regression analysis from bad to good reprogramming efficiencies (n=18) andin 
a separate analysis comparing the five highest and five lowest reprogramming 
efficiency cultures (exact P values can be found in Supplementary Table 3b, d). 
*P<0.05,***P< 0.001; two-sided nominal P value with Benjamini-Hochberg 
correction. f, PAGODA of single-cell RNA-seq from young (n= 30 cells), good and 
bad old (n=31 cells) fibroblasts. Top heat maps, PAGODA clustering of cells. For 
cell PCscore, maroon and blue colours indicate increased and decreased 
expression of the associated gene sets, respectively. Middle heat map, 
expression of specific cytokine genes (Extended Data Fig. 4g). Bottom heat map, 
single cells from good and bad old fibroblast cultures. Gene expression is shown 
as VST-transformed (variance stabilizing transformation, implemented in 
DESeq2) read counts scaled row-wise. 


which genetic and environmental differences also havea role. We used 
the controlled mouse system to understand the stochastic variability 
in reprogramming efficiency between cultures from old mice. Using a 
multi-omics approach, we profiled the transcriptomes, epigenomes 
and metabolomes of young fibroblasts as well as old fibroblasts that 
reprogrammed well (good old) or poorly (bad old) (Fig. 2a and Sup- 
plementary Table 2a). Principal component analysis and unsupervised 
hierarchical clustering showed a separation between young and old 
fibroblasts across datasets (Fig. 2b and Extended Data Fig. 2a-h). Prin- 
cipal component analysis also revealed some separation between the 
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Fig. 3 | Age-associated increase in activated fibroblasts and the cytokines 
that they secrete drive part of the variability in reprogramming between 
mice. a, Top, PAGODA clustering of single-cell RNA-seq data from young and old 
fibroblasts as in Fig. 2f, showing Thyl expression. Bottom, proportion of 
THY1'PDGFRo fibroblasts in fibroblast cultures of young (3 months, n=21) and 
old (29 months, n= 23) mice measured by FACS (3 independent experiments). 
Fold changes were calculated relative to the median of young mice. P value, two- 
tailed Wilcoxon rank-sum test. Each dot represents a culture from one mouse. 
Lines depict median. b, Left, Percentage of THY1’PDGFRa‘Lin’ out of all 
PDGFRa‘Lin fibroblasts isolated from ears of young mice (3-4 months, n=9 
replicates, each with 2-3 mice) and old mice (24-26 months, n=10 replicates, 
each with 2-3 mice) analysed by FACS (3 independent experiments). Pvalue as in 
a. Each dot represents a replicate, with cells pooled from 2-3 mice. Lines depict 
median. Right, heat map of the expression of specific cytokine genes from 
population RNA-seq of fibroblasts. VST-transformed read counts are shown 
scaled row-wise. Young sig. and old sig. indicate the average expression of genes 
that are significantly downregulated and upregulated with age, respectively. 

c, Spearman’s correlation between the proportion of THY1*PDGFRa*® 

(THYT*) fibroblasts ina given culture (quantified by FACS) and the 
reprogramming efficiency (assessed as in Fig. 1c) of that culture (ages asin a; 
young, n= 21; old, n=23;3 independent experiments). Fold changes relative to 
the median of young mice. P values, two-sided algorithm AS 89 in R. Each dot 
represents aculture from one mouse. d, Reprogramming efficiency (RE) of 
FACS-sorted old THY1°PDGFRa’* (THY1*) and THY1 PDGFRa’ (THY1 ) fibroblasts 
treated daily with conditioned medium (CM) from THY1 PDGFRo’ or 


transcriptomes and metabolomes of good old and bad old cultures 
(Fig. 2b and Extended Data Fig. 2i,j). 

Old fibroblasts showed transcriptional enrichment of pathways 
related to secreted factors (for example, cytokine signalling), extracel- 
lular matrix, contractility, inflammation and wound healing (Fig. 2c, d, 
Extended Data Fig. 2k, | and Supplementary Table 2b-e). These fea- 
tures are characteristic of activated fibroblasts (also known as myofi- 
broblasts), which are normally involved in tissue repair?**». Indeed, 
the ‘fibroblast activation’ gene set was enriched in the old fibroblast 
transcriptomes (Extended Data Fig. 2m and Supplementary Table 2f). 
Epigenomic and metabolomics changes supported this fibroblast acti- 
vation signature (Fig. 2d, Extended Data Fig. 2n-t and Supplementary 
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THY1'PDGFRoa‘’ fibroblasts from the same original culture. log-transformed 
fold change relative to THY PDGFRa‘ fibroblasts treated with conditioned 
medium from THYI PDGFRo‘’ fibroblasts (n =5 old mice, 4 independent 
experiments). Pvalues, two-tailed Wilcoxon signed-rank test. Each dot 
represents a culture from one mouse. Lines depict median. e, Reprogramming 
efficiency of pairs of good old and bad old fibroblast cultures treated with their 
own conditioned medium (self conditioned medium) or conditioned medium 
from the other group (swapped conditioned medium). log,-transformed fold 
change relative to bad old self conditioned medium. n=8 pairs of good and old 
bad cultures (5 independent experiments). Pvalues, two-tailed Wilcoxon signed- 
rank test with Benjamini-Hochberg correction. Each dot represents a culture 
from one mouse. Lines depict median. f, Reprogramming efficiency of pairs of 
good old and bad old fibroblast cultures treated with their own conditioned 
medium, which was pretreated with blocking antibodies. log,-transformed fold 
change in reprogramming efficiency relative to bad old conditioned medium 
treated with IgG antibodies. n= 6 pairs of good old and bad old cultures 
(4independent experiments). Pvalues, two-tailed Wilcoxon signed-rank test 
with Benjamini-Hochberg correction. Each dot represents a culture from one 
mouse. Lines depict median. g, Spearman’s correlation between conditioned 
medium and the ratio of IL-6 and TNF levels in the conditioned medium (young, 
n=19; old, n=18; ages as ina; 2 independent experiments). Fold change relative 
to the median of young mice. P values, two-sided algorithm AS 89 in R. Each dot 
represents a culture from one mouse. h, Model for the increased variability in 
cellular reprogramming between mice in vitro. 


Table 2g-m). The transcription factor EBF2, which shows increased 
expression in old fibroblasts, was identified as a potential driver of 
this activated fibroblast signature (Fig. 2d, Extended Data Fig. 2q, u 
and Supplementary Table 2n). Primary fibroblast cultures from 
elderly humans also exhibited increased EBF2 and cytokine-related 
pathway expression (Extended Data Fig. 2v, Supplementary Table 20, p). 
Notably, fibroblast activation was a top feature associated with good 
reprogramming of old fibroblasts in both transcriptomic and epig- 
enomic datasets (Fig. 2e, Extended Data Fig. 2w and Supplementary 
Table 3a-f). Hence, the fibroblast activation signature is enriched in 
old fibroblasts and correlates with the variability between mice in 
reprogramming. 
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Fig. 4| Wound healing rate is variable between old mice and correlates with 
fibroblast subpopulations with distinct cytokine signatures. a, Ear wound 
healing assays in young (3-4 months, n= 26) and old (24-26 months, n=28) mice 
(2 independent experiments). Left, ear wound healing curves from young mice 
and the five fastest- and five slowest-healing old mice. Percentage of wound area 
that remains on the indicated day (mean +s.d.) Right, day of ear wound closure in 
young and old mice. Each dot represents one mouse. Line marks median. 
Pvalues, Fligner-Killeen test to assess difference in variance between age 
groups. b, Single-cell RNA-seq of FACS-sorted PDGFRa’Lin cells from the ear 
wounds of young mice (3-4 months, cells pooled fromn=10 mice) or old mice 
(24-26 months, cells pooled from n=10 mice), 7 days after induction of wounds. 
Left, ¢-distributed stochastic neighbour embedding (t-SNE) clustering of cells 
(3,036 total; 1,592 young, 1,444 old) coloured by Seurat clusters or age. Right, 
log,-transformed fold change in the subpopulations between wounds of young 
and old mice. c, Single-cell RNA-seq of live cells from entire wounds of old mice 
(24 months) with fast-healing (n= 2) and slow-healing (n= 2) trajectories, 7 days 


We wondered whether age-dependent cellular heterogeneity®"*” 


could contribute to the variability between individual mice. To deter- 
mine whether fibroblast cultures are heterogeneous, we performed 
single-cell RNA sequencing (RNA-seq) on young, good old and bad old 
cultures. Although the number of single cells profiled was low, the good 
old culture contained a higher proportion of activated cells compared 
tothe two bad old cultures (Fig. 2f, Extended Data Fig. 4a-g and Supple- 
mentary Table 3g). Thus, the proportion of activated fibroblasts may be 
linked to the variability in reprogramming between individual cultures. 

We validated that old fibroblast cultures were enriched in activated 
cells by staining for a-smooth muscle actin (aSMA), a marker of acti- 
vated fibroblasts?*"*> (Extended Data Fig. 5a). These activated fibro- 
blasts were proliferating and did not exhibit senescence markers (for 
example, p16) (Extended Data Fig. 5b-e). Fluorescence-activated 
cell sorting (FACS) analysis of the pan-fibroblast marker PDGFRa?*” 
as well as THY1”°, which correlates with the activated fibroblast signa- 
ture, confirmed that old fibroblast cultures contained higher propor- 
tions of THY1*PDGFRa* cells (Fig. 3a and Supplementary Table 4a-c). 
THY1°PDGFRa‘ cells expressed fibroblast activation markers, inflamma- 
tory cytokines and Ebf2 (Extended Data Fig. 5f). Fbf2 knockdown in these 
cells reduced expression of fibroblast activation genes (for example, 
Acta2 (which encodes aSMA), //6 and Ccl11 (also known as Eotaxin)), 
whereas Ebf2 overexpression in young fibroblasts induced expression 
of cytokines (for example, //6; Extended Data Fig. 5g, h). In vivo FACS 
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after induction of wounds. t-SNE clustering of cells (n=10,797 total), coloured by 
Seurat clusters or mouse (slow old 1, n=3,761; slow old 2, n=2,127; fast old1, 
n=2,533; fast old 2,n=2,376). Bottom, log,-transformed fold change in the cell 
types between wounds from fast-healing compared to slow-healing old mice. 

d, PAGODA clustering of cells (n =2,678 total; slow old 1, n=1,087; slow old 2, 
n=551; fast old1,n=441; fast old 2,n=599) identified as fibroblasts inc. Top heat 
map, single cells from wounds from old mice with fast- and slow-healing 
trajectories. Bottom heat map, separation of cells based on principal 
componentscores for a subset of the top significantly overdispersed gene sets. 
For cell PC score, maroon and blue colours indicate generally increased and 
decreased expression of the associated gene sets, respectively. log,- 
transformed and normalized gene expression values calculated by Seurat and 
scaled row-wise. Bottom left, log,-normalized expression values of relevant 
genes. Each dot represents a single cell. Line marks median. Bottom right, log,- 
tranformed fold change in the number of cells in each of the three fibroblast 
subpopulations identified by PAGODA. 


analysis also revealed a higher proportion of THY1*PDGFRa‘ fibroblasts 
in the ears of old mice (Fig. 3b), and these fibroblasts exhibited a fibro- 
blast activation signature with expression of inflammatory cytokines 
(Fig. 3b, Extended Data Fig. Si-k and Supplementary Table 4d-g). Thus, 
activated fibroblasts are enriched in old cultures and old tissues in vivo. 

Notably, FACS analysis of fibroblast cultures corroborated the posi- 
tive correlation between the proportion of activated (THY1"PDGFRa’) 
fibroblasts ina culture and the ability of this culture to reprogram (Fig. 3c 
and Extended Data Fig. 5I-n). Reprogramming efficiency also correlated 
positively with proliferation and negatively with senescence (Extended 
Data Fig. So, p). Thus, the proportion of activated fibroblasts, though 
not more variable with age, correlates positively with reprogramming 
efficiency. 

We next investigated how activated fibroblasts influence reprogram- 
ming efficiency. Activated THY1*PDGFRa‘ fibroblasts intrinsically 
reprogrammed less efficiently than their non-activated THYT PDGFRa* 
counterparts (Extended Data Fig. 5q, r). By contrast, conditioned 
medium from activated fibroblasts enhanced reprogramming (of both 
activated and non-activated fibroblasts) compared to medium from non- 
activated fibroblasts (Fig. 3d, Extended Data Fig. 5s-u and Supplemen- 
tary Table 4h). Therefore, activated fibroblasts have opposing intrinsic 
and extrinsic effects on reprogramming efficiency, and the relative 
proportions of activated and non-activated fibroblasts in cultures from 
old mice could underlie the variability in reprogramming efficiency. 


To analyse whether extrinsic factors drive the variability in repro- 
gramming efficiency between individual old cultures, we examined 
the difference in reprogramming efficiency between good and bad 
old fibroblast cultures, treated with their own conditioned medium or 
conditioned medium that was swapped between cultures (Fig. 3e and 
Extended Data Fig. 6a—c). Reprogramming pairs of good and bad old 
cultures with swapped conditioned medium reduced the difference 
between their reprogramming efficiencies (Fig. 3e) by more than 60% 
(Extended Data Fig. 6c). Extrinsic factors thus have a substantial role 
in the variability in reprogramming efficiency between old cultures, 
and intrinsic factors are likely to underlie the remainder of the effect. 

We next tested whether cytokines contribute to the role of extrinsic 
factors on the variability between mice. IL-6, TNF and IL-1B, which are all 
secreted by old fibroblast cultures, affected reprogramming in oppos- 
ing directions: IL-6 enhanced reprogramming efficiency (as previously 
reported”), whereas TNF and IL-1B impaired reprogramming efficiency 
in young and old fibroblasts (Extended Data Fig. 6d-i). Consistently, 
blocking IL-6 with an antibody reduced reprogramming efficiency, 
whereas blocking TNF improved it (Extended Data Fig. 6j, k). To deter- 
mine whether IL-6 and TNF contributed to the variability between mice 
inreprogramming efficiency, we reprogrammed pairs of good old and 
bad old fibroblast cultures in their own conditioned medium, which was 
pretreated with IL-6- or TNF-blocking antibodies. While blocking IL-6 had 
aminor effect, blocking TNF reduced the difference in reprogramming 
efficiency between pairs of good old and bad old cultures (Fig. 3f) by 
more than 40% (Extended Data Fig. 6l-n). The IL-6:TNF ratio correlated 
with reprogramming efficiency (Fig. 3g and Extended Data Fig. 60-q). 
Hence, the proportions of activated and non-activated fibroblasts, and 
the ratio of inflammatory cytokines that they secrete (for example, IL-6 
and TNF), could drive the variability between fibroblast cultures of dif- 
ferent old mice (Fig. 3h). 

Fibroblasts are critical for wound healing in vivo??"*, Although the 
influence of ageing on wound healing has been examined?>?3, the 
variability of this response is not known. We assessed the rate of healing 
in wounds on the ears of young and old mice (Fig. 4a). While the median 
wound healing rate was not significantly affected by age, there was an 
increased variability in wound healing rate between old mice, withsome 
old mice healing faster and some slower than young mice (Fig. 4a and 
Extended Data Fig. 7a-g). 

To determine the overall fibroblast composition in wounds from 
young and old mice, we performed single-cell RNA-seq on FACS-sorted 
fibroblasts pooled from the wounds of 10 young or 10 old mice, 7 days 
after the induction of wounds—irrespective of wound healing rates 
(Fig. 4b and Extended Data Fig. 7c, d). Fibroblast composition changed 
in wounds from old mice in vivo (Fig. 4b), with subpopulations of fibro- 
blasts exhibiting signatures of fibroblast activation and increased 
cytokine signalling (Extended Data Fig. 8a-f). 

We next performed single-cell RNA-seq on all cells from the wounds 
of old mice with slow- or fast-healing trajectories (Fig. 4c and Extended 
Data Fig. 8g-i). Although epithelial cells were not identified (perhaps 
owing to the isolation protocol or wound composition and as previ- 
ously reported"), fibroblasts, endothelial cells and immune cells were 
identified (Fig. 4c and Extended Data Fig. 8j). Notably, fibroblasts were 
more abundant in wounds of slow-healing old mice, whereas immune 
cells were more abundant in wounds of fast-healing old mice (Fig. 4c 
and Supplementary Table 5e). Although the number of mice is low and 
differences inthe composition of cells could also be influenced by wound 
stage and isolation properties, fibroblast populations may therefore be 
associated with distinct wound healing trajectories. 

Clustering using both Seurat and pathway and gene set overdisper- 
sion analysis (PAGODA) on wound fibroblasts from slow-healing or fast- 
healing old mice identified three main subpopulations (A, B and C) that 
were enriched in different aspects of fibroblast activation (Fig. 4d and 
Extended Data Fig. 9d, e; for a combined analysis of both single-cell 
RNA-seq datasets, see Extended Data Fig. 9h-1). Whereas fibroblast 


subpopulation A was present in wounds of both slow- and fast-healing 
mice, fibroblast subpopulation B was more abundant in wounds of 
fast-healing old mice and exhibited increased cytokine expression and 
signalling (for example, 7nf; Fig. 4d, Extended Data Fig. 9d, f, kand Sup- 
plementary Table 5f, g). Thus, TNF is associated with fast wound healing 
in vivo and bad reprogramming in vitro (fast wound healing might lead 
to fibrosis, which is detrimental). By contrast, fibroblast subpopula- 
tion C was more abundant in wounds from slow-healing old mice and 
exhibited higher expression of other cytokines (for example, Ccl11) and 
the transcription factor Fbf2 (Fig. 4d, Extended Data Fig. 9d-g, k and 
Supplementary Table 5f, g). Activated fibroblast subpopulations with 
distinct cytokine profiles (for example, TNF compared to IL-6 or CCL11) 
may therefore be associated with increased variability in reprogramming 
in vitro and wound healing trajectories in old mice. 

Our study shows that ageing is associated with an increased variability 
between mice in cellular reprogramming in vitro and in wound heal- 
ing in vivo, perhaps reflecting different ageing trajectories. Increased 
variability is emerging as common feature of ageing ®, and we identify 
inflammatory cytokines, including TNF, as key contributing factors to 
variability in reprogramming efficiency (although other intrinsic and 
extrinsic factors may also exist). Cytokine signalling may also regulate 
the variability in other ageing phenotypes, including wound healing. 
Dermal fibroblasts have been shown to lose cellular identity and acquire 
adipogenic traits during ageing”, and this increased cellular heteroge- 
neity could also contribute to the differences between individual mice. 
As fibroblasts exhibit tissue-specific properties, variability in distinct 
tissues may differentially increase with age. 

Asubpopulation of activated fibroblasts could bea source of chronic 
inflammation in old individuals and contribute toimmune cell recruit- 
ment?*5”°, Activated fibroblasts (which proliferate) and senescent 
fibroblasts (which show permanent cell cycle arrest) secrete overlapping 
yet distinct sets of cytokines™ and may interact ina complex manner 
to influence reprogramming and wound healing. Wound healing is a 
major issue for elderly individuals, with either deficient wound healing 
(which can lead to ulcers) or excessive wound healing (which can lead to 
fibrosis)”*"’. Changes in fibroblast subpopulations and cytokines with 
age could contribute to these pathologies and constitute targets for 
personalized strategies to restore functional wound healing in elderly 
individuals. 
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Methods 


Mice 

All mice used in this study were male C57BL/6 mice. Mice of different 
ages (3-29 months) were obtained from the National Institute on Ageing 
(NIA) colony, and were acclimatized to the animal facility at Stanford 
University for at least 1 week before being processed. No live animals 
were censored. For most animal experiments, young and old mice were 
processed in an alternate manner rather than in two large groups, to 
minimize group effects, and no blinding was performed. All experi- 
mental procedures were approved by Stanford’s Administrative Panel 
on Laboratory Animal Care and were in accordance with institutional 
and national guidelines. At Stanford University, all mice were housed 
in the Comparative Medicine Pavilion, and their care was monitored 
by the Veterinary Service Center at Stanford University under IACUC 
protocol 8661. 


Collection of blood and plasma from young and old mice 

To assess the systemic changes associated with age, whole blood was 
collected from young and old mice by cardiac puncture into a tube 
containing EDTA (Thermo Fisher Scientific, AM9262) (for a final con- 
centration of 5mM EDTA per blood sample). Blood cell composition, 
including white and red blood cell, granulocyte, monocyte, lymphocyte 
and platelet counts were analysed with a Hemavet Multispecies Hema- 
tology Analyzer (CDC Technologies) according to the manufacturer’s 
instructions. Plasma was prepared from whole blood samples by two 
consecutive centrifugation steps at 500 r.c.f. and 13,000 r.c.f., respec- 
tively, each for 10 min at room temperature, and then aliquoted and 
stored at -80 °C for cytokine profiling (see ‘Cytokine profiling analysis 
on plasma and conditioned medium using Luminex multi-analyte’). 


Generation of primary cultures of fibroblasts from young and old 
mice 

To investigate the effect of ageing on tissue fibroblasts, primary fibro- 
blast cultures were established from the ears and lungs of young and 
old mice. To this end, the ears and lungs were cut into small fragments 
(approximately 1 mm) and digested in Dulbecco’s modified Eagle 
medium (DMEM, Invitrogen, 11965-092) supplemented with 0.14 Wun- 
sch units ml‘ of Liberase TM (Roche, 05401127001) for 30-90 min. The 
fragments were washed with DMEM supplemented with 15% fetal bovine 
serum (FBS, Gibco, 16000-044, lots 551495 and 1551824) and plated on 
tissue culture plates with DMEM supplemented with 15% FBS and 1% peni- 
cillin-streptomycin-glutamine (PSQ) (Gibco, 10378). To isolate primary 
adult fibroblasts from the chest area, the skin on the chest was dissected 
from the animals, the subcutaneous fat and fascia were removed, and 
the tissues were incubated overnight at 4 °C with the epidermal layer 
of the skin facing down on top of a solution of 0.25% trypsin (Gibco, 
25200-056). The following day, the epidermis was removed, tissues 
were cut into small fragments (approximately 1 mm’) and treated with 
1,000 Um collagenase | (Gibco, 17100017) in DMEM for 60-90 min at 
37 °C. Digested fragments were funnelled through a 70-um nylon mesh 
(Fisher Scientific, 08-771-2), washed with fibroblast growth medium 
(DMEM supplemented with 10% FBS and 1% PSQ) and plated using the 
same medium. The cells were passaged once, before being aliquoted, 
frozen and stored in liquid nitrogen (passage 1.5). For all experiments, 
unless stated, fibroblasts were thawed (passage 2) and cultured at 37 °C 
in 5% CO, and 95% humidity in fibroblast growth medium. All experi- 
ments, unless specifically noted, were performed at passage 3. 


FACS analysis of primary fibroblasts 

To determine the purity of the primary fibroblasts from young and old 
mice, FACS analysis was performed on fibroblast cultures at passage 3. 
FACS analysis was performed using an LSR II flow cytometer (BD Bio- 
sciences) and analysed using FlowJo v.10.0.7. For FACS analysis, fibro- 
blasts were stained with phycoerythrin-conjugated CD140a (BioLegend, 


135905), in combination with the following allophycocyanin-conjugated 
antibodies: B220 (eBioscience, 47-0452-82), CD3 (BD Pharmingen, 
557597), Gr-1(eBioscience, 17-5931-82), F4/80 (eBioscience, 17-4801-82), 
Siglec H (BioLegend, 129611), CD11c (eBioscience, 17-0114-82) and 
propidium iodide staining solution (BD Pharmingen). 


Generation of primary cultures of fibroblasts from young and old 
human individuals 

To determine whether primary fibroblasts from humans also exhibit 
an inflammatory profile, we collected biopsies from humans at differ- 
ent ages. Stanford Human Subjects approval and informed consent 
was obtained before all study procedures (under protocol ID 25269, 
IRB 350). Biopsies were collected from male participants of different 
ages with four biological grandparents of Ashkenazi Jewish descent, 
generally healthy without thyroid disease, diabetes, immunodeficiency, 
ongoing cancer or autoimmune disease, and no history of poor wound 
healing (Supplementary Table 1g). A 4-mm punch biopsy of pre-auricular 
skin was obtained after injection of 1% lidocaine with epinephrine 
(1:1,000,000). Skin biopsies were rinsed with PBS, cut into smaller frag- 
ments (around 1 mm’) and plated into a dry 6-well tissue-culture plate. 
Excess PBS was removed, and fibroblast growth medium (DMEM supple- 
mented with 10% FBS and 1% PSQ) was added. Tissues were incubated at 
37 °Cin5% CO, and 95% humidity. After 24 h, tissues were supplemented 
with fibroblast growth medium, and the medium was changed every 
3-4 days. The cells were passaged once, before being aliquoted, frozen 
and stored in liquid nitrogen (passage 1.5). 


Cytokine profiling analysis on plasma and conditioned medium 
using Luminex multi-analyte 

We examined the effect of ageing on the inflammatory profiles by per- 
forming cytokine profiling on plasma and conditioned medium from 
fibroblast and iPS cell cultures from young and old mice. Plasma was 
collected as described above (Supplementary Table 1a). Conditioned 
medium from young and old mouse (ear and lung) and human (skin) 
fibroblasts was collected 48 h after plating from 150,000-200,000 
primary fibroblasts (passage 3 or 33) plated in a 6-cm dish with 2 ml of 
fibroblast growth medium (Supplementary Table 1b-d). Conditioned 
medium from iPS cells (passage 23; Extended Data Fig. 1g and Supple- 
mentary Table 1e) was collected 24 h after plating from 500,000 cells 
maintained in serum- and feeder-free culture conditions in 2i medium 
(see ‘Cytokine profiling analysis on plasma and conditioned medium 
using Luminex multi-analyte’ for more information). Conditioned 
medium from cultures of THY1°PDGFRo* and THY1 PDGFRa® FACS- 
sorted young and old fibroblasts (passages 4-6, see ‘FACS and analysis 
of primary fibroblasts’ for FACS sorting protocol) was collected 24h 
after plating from 0.5-1 million cells plated in a 15-cm dish with 20 ml of 
medium (Supplementary Table 4h). Conditioned medium was collected, 
centrifuged at 10,000 r.c.f. for 10 min at room temperature, aliquoted 
and stored at -80 °C. For all of these conditions, cell numbers were 
determined for each plate by counting on haemocytometer for nor- 
malization purposes. In addition, cell-free medium was used to assess 
background fluorescence. All cytokine profiling was performed by the 
Stanford Human Immune Monitoring Center using a Luminex mouse 
38-plex or ahuman 62-plex analyte platform (eBiosciences/Affymetrix) 
that detects 38 or 62 secreted proteins, respectively. 

All plasma samples were measured in technical duplicates and all con- 
ditioned medium samples were measured in single technical replicates 
as per recommendation of the Human Immune Monitoring Center at 
Stanford University. All of our analyses were performed using mean 
fluorescence intensity (MFI) values, because converting MFI to clinically 
relevant measures (such as pg mI) can introduce a degree of error. 
We report pg mI‘ conversions in Supplementary Table la-e to facilitate 
comparison with existing literature. To compare values across plates 
and independent experiments, the MFI values were normalized to the 
median of young (3 months) within each experiment, generating fold 
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change values. In addition, the conditioned medium levels were normal- 
ized to the cell number of the same dish. Two plasma samples from old 
mice were discarded, as the coefficient of variation was >20% for most of 
the cytokines measured between the two technical replica for these two 
plasma samples. Ranked fold changes in cytokine levels were calculated 
by multiplying the log,-transformed fold median change (old/young) 
with the -log,,(P) values. Similarly, ranked Spearman p correlations were 
calculated by multiplying the Spearman p values with —log,,(P) values. 


Lentiviral production for reprogramming 

Toinduce reprogramming in fibroblasts and generate iPS cells, we used 
the lentiviral vector 4F STEMCAA-loxP, containing a floxed version of 
EFla-STEMCCA enabling the expression of human OCT4, KLF4, SOX2 
and MYC. Lentiviruses were produced inhuman embryonic kidney 293T 
(HEK293T, ATCC, CRL-11268) packaging cells. The HEK293T cell line was 
not authenticated in-house, but mycoplasma testing was conducted at 
regular intervals (every 2-3 months). The day before transfection, 9 x 10° 
HEK293T cells were plated in a10-cm dish in HEK293T medium (DMEM 
supplemented with 10% FBS, 1% PSQ). The next day, the cells were trans- 
fected as follows: 100 pl of 1 mg mI" polyethylenimine (PEI; Polysciences, 
23966-2, linear 25 kDa) was added to 2 ml of DMEM and incubated for 
10 min at room temperature. The lentiviral vector of interest (20 1g) was 
mixed with lentiviral packaging vectors (1 pg of pHDM-tatlb (PlasmID), 
1pg of pRC-CMV-rev1b (PlasmID), 1 1g of pHDM-Hgpm2 (PlasmID)) and 
envelope vector (2 ug of HDM-VSV-G (PlasmID), added to the PEI-DMEM 
mixture and incubated for 15 min at room temperature. The PEI-DMEM- 
DNA mixture was then added dropwise to the HEK293T cells, and 12h 
after transfection the medium was replaced with 8 ml fresh HEK293T 
medium. Viral supernatants were collected at 24 and 36 h after transfec- 
tion, centrifuged at 3,000 r.p.m. for 15 min, and carefully transferred 
into afresh tube, after which 0.7 ml of the crude virus supernatant was 
used to reprogram primary fibroblasts (see ‘Reprogramming of young 
and old fibroblasts to iPS cells and characterization of the iPS cells’). 


Reprogramming of young and old fibroblasts to iPS cells and 
characterization of the iPS cells 

We generated iPS cell lines from three independent young fibroblast 
cultures and from three independent old fibroblast cultures (Supple- 
mentary Table If). Reprogramming of primary fibroblasts was induced 
as follows: 100,000 primary fibroblasts at passage 3 were plated ina 
well of a 6-well plate, and were infected 24 and 36 h after plating with 
0.7 ml crude virus supernatant mixed with 8 pg mI polybrene (Sigma- 
Aldrich, H9268-5G). Next, 48 h after plating (12 h after the last round of 
infection), the infected primary fibroblasts were plated at a density of 
5,000 cells ona10-cm dish containing 1.5 x 10° y-irradiated feeder cells 
(mouse embryonic fibroblasts (MEFs)). Cells were maintained in fibro- 
blast growth medium for 7 days, and then switched to mouse embryonic 
stem (mES) cell medium, consisting of DMEM, GlutaMax (Life Technolo- 
gies, 10569-010), 15% FBS, 1% PSQ, 5 x 10° units of leukaemia inhibitory 
factor (EMD Millipore, ESG10007), 1% MEM nonessential amino acids 
(Gibco, 11140-050) and 0.0008% B-mercaptoethanol (Sigma-Aldrich, 
M-7522). On days 13-15 after infection, colonies with a distinct mES 
cell morphology were manually picked from 10-cm dishes and each iPS 
cell clone was transferred into a well of a 96-well plate (primary plate) 
in the presence of y-irradiated MEFs. A minimum of 24 iPS cell clones 
per parental fibroblast line were picked and replicates of each 96-well 
primary plate were created. These replicate plates were used to evalu- 
ate the number of viral integrations in each clone, whereas the primary 
plates were temporarily frozen and stored at —80 °C. To determine the 
number of viral integrations, on-plate genomic DNA extractions were 
performed as previously described”, and the Mouse TaqMan Copy 
Number Reference Assays from Thermo Fisher was used to estimate 
the number of viral integrations from the genomic DNA extracted. A 
TaqMan probe targeting the human KLF4 gene (FAM dye labelled) was 
used because the 4F STEMCAA-loxP vector contains the human version 


of the reprogramming factors (Life Technologies, 4331182). A TaqMan 
probe targeting the mouse transferrin receptor gene (7frc), which is 
known to be encoded by a single gene in the mouse genome, was used 
as the reference (VIC dye labelled) (Life Technologies, 4458366). Only 
iPS cell clones with an estimated viral integration number equal to or 
lower than 3 were chosen for further analysis. 

For 13 of these lines, we generated transgene-free iPS cell lines by 
excising the reprogramming factor construct and performed long-term 
passaging (until passage 23), as this is known to improve the pluripo- 
tency state”. To this end, primary plates were quickly thawed and the 
iPS cell clones were transferred into a fresh 96-well plate in the presence 
of y-irradiated MEFs, and subsequently expanded. At passage 10, the 
integrated 4F STEMCCA lentiviral construct was excised using Cre- 
recombinase expressed under the CAG promoter (pCAG-Cre)”. The 
pCAG-Cre construct was transfected using a Mouse ES Cell Nucleofector 
Kit (LONZA, V4XP-3012) according to the manufacturer’s instructions. 
Transfected cells were then resuspended in mES cell medium, plated 
on feeder cells at a very low density ina 10-cm dish (500 cells per dish) 
and cultured in mES cell medium until colonies appeared. For each 
iPS cell clone, multiple subclones were isolated and expanded. The 
efficiency of Cre-recombinase excision was assessed by PCR using the 
Mouse TaqMan Copy Number Reference Assays as described above. 
Only transgene-free iPS cell clones were further characterized. iPS cell 
lines were maintained on ES cell medium for 10 passages after excision, 
before being adapted to serum- and feeder-free culture conditions in 2i 
medium according to the CReM Boston University ES cell culture proto- 
cols (http://www.bu.edu/dbin/stemcells/protocols.php). All molecular 
characterizations of the iPS cell lines were performed at passage 23, 
including the inflammatory, transcriptomic and metabolomics profil- 
ing (Extended Data Fig. 3). 

To assess whether the derived iPS cell lines could give rise to cell 
types from all three germ layers after formation of embryoid bodies, 
we induced the formation of embryoid bodies. In brief, iPS cells at pas- 
sage 23 were incubated with accutase (EMD Millipore) for 5 min at 37 °C 
to obtain a single-cell suspension and 10 ml of the iPS cell suspension 
ata density of 10° cells per ml was plated on ultralow attachment plates 
(Corning). Cells were allowed to form embryoid bodies. After 4 days, 
embryoid bodies were transferred into regular tissue-culture-grade 
plates in DMEM high glucose supplemented with 10% FBS, 100 U mI 
penicillin and 100 pg mI streptomycin (Gibco), and embryoid bodies 
were allowed to differentiate. At day 14 after embryoid body differentia- 
tion, differentiated cells were collected and analysed by qRT-PCR for 
the expression of endodermal, mesodermal and ectodermal markers 
(primer sequences are listed in Supplementary Table 6a). 


RT-qPCR on iPS cells and differentiated cells from embryoid 
bodies 

Toassess the expression of specific genes in iPS cells and in differentiated 
cells from embryoid bodies, RNA purification and cDNA synthesis was 
performed. To this end, total RNA was isolated using the RNeasy RNA 
Purification Kit (QIAGEN) and 0.5-1 1g of RNA was reverse-transcribed 
using the High Capacity cDNA Reverse Transcription Kit (Applied Bio- 
systems) according to the manufacturer’s instructions. cDNA was used 
for RT-qPCR on the BioRad iCycler using iQ SYBR Green Mix (BioRad). 
Hprt1 was used as housekeeping gene for normalization. All primer 
sequences are listed in Supplementary Table 6a. 


Assessment of reprogramming efficiency 

To determine the impact of ageing on iPS cell generation, reprogram- 
ming efficiency was quantified using a 96-well assay as previously 
described’. In brief, reprogramming was induced as described above. 
Then, 48 hafter plating (12 h after the last round of lentiviral infection), 
the infected primary fibroblasts were plated at a density of 20-40 cells 
per well into 96-well plates containing 1,000 y-irradiated feeder MEFs 
per well. In experiments using cytokines and conditioned medium, 0.1% 


gelatin-coated plates (Tribec Science, TBS8004) without feeder cells 
were used to avoid confounding factors from the feeder cells. 

Infected primary fibroblasts were maintained on fibroblast growth 
medium until day 7 after plating and then switched to mES cell medium 
until day 13-15. For experiments assessing the effect of conditioned 
medium on reprogramming efficiency, fresh conditioned medium was 
collected from 10-25-cm dishes in which cells were grown in parallel, 
centrifuged 10,000 r.c.f. for 10 min at room temperature and added 
every day, starting from day 1 of replating into 96-well plates until the 
end of experiment. For experiments testing the influence of specific 
cytokines on reprogramming efficiency, fresh medium with the indi- 
cated cytokine was added every day until the switch to iPS cell medium. 

To assess reprogramming efficiency, staining with AP (an early marker 
of pluripotency”’) was performed by fixing the cells in 4% paraformalde- 
hyde (Santa Cruz Biotechnology, sc-281692) for 15 min at room tempera- 
ture, washing with citrate solution (Sigma-Aldrich, 3861) and subsequent 
staining with prepared diazonium salt solution (Sigma-Aldrich, 851) with 
napthol (Sigma-Aldrich, 855) overnight. Quantification was performed 
by counting the number of wells containing at least one AP* colony. 
To complement AP staining, we also used staining with stage-specific 
embryonic antigen 1 (SSEA1) (a later marker of pluripotency’°). SSEAI 
staining was performed using StainAlive mouse anti-mouse antibody 
(Stemgent, 09-0067) according to the manufacturer’s recommenda- 
tions. Quantification was performed by counting the number of wells 
containing at least one SSEA1* colony using a Zeiss inverted microscope 
(Zeiss AxioVision A10). 

Reprogramming efficiency was calculated as the number of AP* or 
SSEAI* clones, divided by the number of cells plated, and multiplied 
by the efficiency of viral infection (see ‘Immunofluorescence staining 
of reprogramming factors and pluripotency markers’). To compare 
reprogramming efficiencies across plates (and independent experi- 
ments), the reprogramming efficiencies of all individual cultures were 
normalized to the median reprogramming efficiency of young cultures 
within a given experiment. Statistical differences in variance in repro- 
gramming efficiency between the age groups were calculated using the 
non-parametric Fligner-Killeen test using R v.3.3.0. To assess whether 
the increased variability in reprogramming efficiency with age was intro- 
duced by pooling multiple cohorts, we performed a permutation test 
in which the null distribution was estimated by randomly assigning the 
age groups to the observed reprogramming efficiencies for individual 
cultures within eachcohort, andthe mean differenceinstandard deviation 
between young and old cells was calculated across the cohorts. This was 
repeated 1,000 times, and the Pvalue was calculated as the percentage 
of differences greater than or equal to the actual observed difference 
in standard deviation. This approach indicated that the increased vari- 
ability in reprogramming efficiency with age is not simply caused by 
the pooling of multiple cohorts (P< 0.001). 


Immunofluorescence staining of reprogramming factors and 
pluripotency markers 

For immunofluorescence staining of pluripotency markers, cells were 
fixed in 4% paraformaldehyde for 15 min at room temperature, then 
permeabilized with 0.5% Triton X-100 for 10 min, blocked in blocking 
solution (2% bovine serum albumin (BSA), 5% glycerol, 0.2% Tween-20, 
0.1% sodium azide in PBS) for 1h, followed by incubation with primary 
antibodies. The following antibodies were used for immunofluores- 
cence: rabbit anti-OCT3/4 (Santa Cruz Biotechnology, sc9081), SSEA1 
StainAlive mouse anti-mouse antibody (Stemgent, 09-0067) and rabbit 
anti-SOX2 (Santa Cruz Biotechnology, sc17320). The nuclei were stained 
with DAPI (Life Technologies). Cells were imaged using a Zeiss inverted 
microscope (Zeiss AxioVision A10) with AxioVision v.4.7.2 software. For 
calculations of the infection efficiency, 5-10 images were randomly 
taken per sample and uploaded in Image] (v.1.46r), and the infection 
efficiency was calculated by dividing the number of OCT4‘ cells by the 
total number of cells (as determined by DAPI staining). 


Reprogramming of young and old fibroblasts to induced neurons 
To determine the ability of young and old primary fibroblasts to repro- 
gram to induced neurons, induced neuron reprogramming was induced 
as previously described”®. In brief, young and old fibroblast cultures 
(passage 3) were plated at a density of 60,000 cells per well in a 12-well 
plate. The following day, the fibroblasts were infected as described 
above with lentiviruses carrying TetO-FUW-ASCL1 (Addgene, 27150), 
TetO-FUW-BRN2 (Addgene, 27151), TetO-FUW-MYTIL (Addgene, 27152) 
and FUW-rtTA (Addgene, 20342). The next day, doxycycline (2 pg mI", 
Sigma-Aldrich) in fibroblast growth medium was added to the wells. 
Medium was changed to neuronal medium (N2, B27, DMEM/F12 (Inv- 
itrogen), 1.6 ml insulin (6.25 mg mI“, Sigma-Aldrich)) and doxycycline 
(2 pg mI) two days after the first doxycycline induction. Subsequently, 
neuronal medium was changed every three days. To determine the num- 
ber ofinduced neurons at day 7 for each fibroblast culture, the cells were 
digested using 0.25% trypsin (Invitrogen) at 37 °C for 5 min, and all cells 
were subjected to magnetic activated cell sorting (MACS) to select for 
APC-conjugated PSA-NCAM*‘ cells (Miltenyi, 130-093-273), according 
tothe manufacturer’s instructions. The number of PSA-NCAM*‘ cells for 
each fibroblast culture was counted manually using a haemocytometer. 
The reprogramming efficiency for each line was obtained by dividing 
the total number of PSA-NCAM* cells obtained at day 7 by the number 
of fibroblasts plated. The ability of the primary fibroblast cultures to 
undergo induced neuron and iPS cell reprogramming was assessed 
in parallel. Note that in this comparison, infection efficiency was not 
assessed and hence not included in the calculation of reprogramming 
efficiency. 


RNA-seq analysis 

To profile transcriptomic changes in primary fibroblast cultures with 
age and after iPS cell reprogramming, total RNA was isolated from pas- 
sage 3 fibroblasts and passage 23 iPS cells using the RNeasy kit (QIAGEN) 
according to the manufacturer’s instructions. Total RNA (150 ng) was 
used to prepare RNA-seq libraries using the Encore Complete RNA-seq 
library kit (Nugen Technology, 0333), according to the manufacturer’s 
instructions. Libraries were sequenced on HiSeq 2000 (2 x 10 bp paired- 
end reads, Illumina). 

Quality and adaptor trimming of the Fastq files was performed using 
TrimGalore v.0.2.8, retaining reads with a minimum Phred score of 15. 
The trimmed reads were mapped to the mouse genome (mm¢9 build) 
using TopHat (v.2.0.8b). Reads per genes were counted using HTSeq 
(v.0.6.1). As annotation file, we used the genes.gtf downloaded from 
UCSC on 6 March 2013. Gene expression was analysed using DESeq2 
(v.1.20.0). For differential expression analysis of fibroblasts, batch 
effect was accounted for by including a batch variable into the DESeq2 
model (see Supplementary Table 2a). Genes with >0.3 fragments per 
kb of transcript per million mapped reads in at least one sample within 
a particular analysis, were considered expressed and included in the 
analysis. Heat maps, hierarchical clustering and principal component 
analysis (PCA) were performed on VST-transformed values (imple- 
mented in DESeq2). Genes were considered significantly differentially 
expressed if they had FDR-adjusted values of P< 0.05 and an absolute 
fold change >1.5, unless stated otherwise. Publicly available datasets 
were downloaded from the GEO database (Supplementary Table 6b) 
and processed as described above. Note these following RNA-seq 
samples were excluded from further analyses: (1) two old and three 
middle-aged RNA-seq libraries as they lacked any young samples, and 
hence batch-effects could not be corrected for; (2) RNA-seq libraries 
from one good old and one bad old fibroblast cultures as their repro- 
gramming efficiency could not be confirmed across several independ- 
ent experiments; (3) RNA-seq libraries from 2 iPS cell lines (out of 
13 total) failed at the quality-control stage because they showed large 
differences (for example, in number of reads mapped) from the rest 
of the samples (Supplementary Table If). 
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Pathway enrichment analysis was performed using one-sided Fisher’s 
exact tests, testing for the overrepresentation of significantly differ- 
entially expressed genes in a given gene list. As background, all of the 
genes that were considered expressed (see above) were used. P values 
adjusted for multiple hypothesis testing using Benjamini-Hochberg 
correction, and FDR-adjusted P= 0.05 was set as upper threshold. In 
Extended Data Fig. 2m, analysis of gene set enrichment was conducted 
using the gene set enrichment analysis (v.2.2.2) tool. For this analysis, the 
VST-transformed values (derived from DESeq2) were used, and enrich- 
ment statistics were calculated using the ‘classic’ method parameter. 
Nominal P values were calculated based on 10,000 permutations. In 
Fig. 2e and Extended Data Figs. 2w, 5j, 7e, analysis of gene set enrich- 
ment was conducted by calculating the arithmetic mean of gene-wise 
test statistics (Wald test statistic from differential expression analysis 
using DESeq2) per gene set. To calculate a P value for each gene set, we 
constructed a null distribution of test statistics by sampling 10,000 times 
ngenes (n indicates the number of genes in the respective gene set) 
and calculating the mean of the test statistics for these genes. A gene 
set-wise Pvalue was then calculated as the percentage of absolute (sam- 
pled) mean test statistics that were equal or greater than the absolute 
(observed) mean test statistic for that pathway. Pvalues were corrected 
for multiple hypothesis testing using the Benjamini-Hochberg algo- 
rithm using FDR-adjusted P= 0.05 as threshold. This method for gene 
set enrichment analysis has been shown to outperform many commonly 
used methods”. KEGG, Gene Ontology (GO) terms were acquired from 
http://amp.pharm.mssm.edu/Enrichr/#stats. 

Upstream regulator analysis was performed using ingenuity pathway 
analysis (IPA; QIAGEN) software, using the genes that passed the filter 
in our dataset as reference genome. 

Motif analysis of promoter regions (—1,000 to +50 bp relative to the 
transcription start sites) of differentially expressed genes was performed 
using the Homer software (v.4.8)”°, using the genes that passed the filter 
in our dataset as background. 


Chromatin immunoprecipitation followed by sequencing and 
analysis of the epigenomic landscape 

To profile changes in the epigenomic landscape of primary fibroblasts 
with age, we performed ChIP experiments using anti-H3K4me3 (Active 
Motif, 39159) and anti-H3K27me3 (Active Motif, 39536) antibodies. 
In brief, 1-2 x 10° fibroblasts were crosslinked with 1% formaldehyde 
for 10 min at room temperature, and formaldehyde was quenched by 
addition of glycine to a final concentration of 0.125 M. Chromatin was 
sonicated to an average size of 0.5-2 kb, using Bioruptor (Diagenode). 
A total of 5 pg of antibody was added to the sonicated chromatin and 
incubated overnight at 4 °C ona rotating platform. Subsequently, 10% 
of chromatin used for each ChIP reaction was retained as input DNA. 
Then, 100 ul of protein G Dynal magnetic beads were added to the ChIP 
reactions and incubated for an additional 4 h at 4 °C. Magnetic beads 
were washed, followed by reversal of crosslinks and DNA purification. 
Resultant ChIP DNA was dissolved in water. ChIP and input libraries were 
generated according to the Illumina protocol and sequenced as single- 
end 50 bp reads using the Illumina HiSeq 2000 platform. 

For analysis, Fastq reads were quality-trimmed using the trim-galore 
software (v.0.2.1), with a Phred score threshold of 15 and a minimum 
remaining read length of 36 bp. Trimmed reads were mapped to 
the mm9 genome assembly using Bowtie v.0.12.77. Duplicate reads 
were eliminated using the FIXSEQ software with default parameters”. 
ChIP-seq peaks were called in all samples using the MACS (v.2.08) soft- 
ware with default settings and the --broad option®™. Input datasets 
were used as baseline. 

To identify H3K4me3 and H3K27me3 ChIP-seq peaks with differential 
intensity in young compared to old or good old compared to bad old 
samples, we used the DiffBind R package (v.1.12.3)°. Reads were quanti- 
fied in each sample over ‘meta-peaks’, that is, peaks called using pooled 
reads from one specific mark (H3K4me3 reads and H3K27me3 reads) 


over all samples. Meta-peaks help to best determine peak boundaries*. 
ChIP-seq read counts normalized to input reads counts by DiffBind were 
then analysed using the DESeq2 package (v.1.6.3)” to identify peaks 
with significantly different intensity. Hierarchical clustering and PCA 
were performed on VST-transformed values (implemented in DESeq2). 
The differential peak intensity and pathway analyses were restricted to 
the peaks that extended at least 100 bp into the promoter regions of the 
nearest genes (defined as transcriptional start site +2,000 bp), and were 
performed as mentioned above (see ‘RNA-seq analysis’). 

Broad H3K4me3 domains are genomic regions coated with H3K4me3 
and are enriched at genes involved in cell identity and/or function*®. 
To compare H3K4me3 breadth of samples across young and old sam- 
ples, we used the approach described previously**. In brief, we used the 
H3K4me3 meta-peaks to compare the signal-to-noise ratio across sam- 
ples. This revealed that sample number 2 for H3K4me3 from 3-month- 
old fibroblasts was the noisiest sample of the 5 samples. We therefore 
downsampled all other samples to match the coverage histogram of 
that specific sample. We then called peaks as described above using 
the calibrated files in MACS (v.2.08) and isolated the top 5% broadest 
H3K4me3 domains (broad H3K4me3 domains) from each peak file. 
We identified reproducible broad H3K4me3 domains by retaining only 
those that were present in all young or all old samples, and we restricted 
the analysis to those. 

Bivalent domains are genomic regions coated with both H3K4me3 and 
H3K27me3*”. To identify differential bivalent regions between young 
and old samples, the H3K4me3 and H3K27me3 peaks that are consistently 
present in young samples were compared tothe ones that are consistently 
present in old samples. To define robust bivalently marked regionsin each 
age group, we called H3K4me3 and H3K27me3 meta-peaks separately 
at each age. Then, at each age and for each mark, we identified peaks 
that were supported by all of the individual experimental samples (that 
is, reproducible peaks). Bivalent peaks were obtained by the intersec- 
tion of H3K4me3 and H3K27me3 reproducible peaks in all young or old 
samples. Note that the pathway enrichment analysis was restricted to the 
bivalent domains in young that lose H3K27me3 in old and to the H3K4me3 
peaks in young that gain H3K27me3 in old, as these domains are likely to 
exhibit altered expression of their associated genes. 

Together, the epigenomic profiling identifies age-dependent changes 
in the epigenomic landscape (for example, H3K4me3 intensity and 
breadth) and reveals enrichment of pathways involved in activated 
fibroblasts, such as cytokines, extracellular matrix components 
and contractility-related features (Fig. 2d, Extended Data Fig. 2n—q, x and 
Supplementary Table 2g-l), corroborating the transcriptomic findings. 


Metabolomics analysis 

To profile changes in metabolomics features in cultured fibroblasts 
with age, frozen cell pellets were mixed with 80% methanol (mass- 
spectrometry-grade) ina ratio of 10 pl per mg cell pellet (a million cells 
weighs roughly 13 mg). The suspension was then processed by three 
rounds of 1 min vortex at maximum speed, chilled briefly on ice. The 
mixture was incubated at 4 °C for 20 min before centrifuging at 20,000g¢ 
for 20 min at 4 °C. The supernatants were used as metabolite extracts 
for liquid chromatography-—mass spectrometry analysis. For liquid 
chromatography-mass spectrometry analysis, the metabolite extracts 
were transferred to 150 pl deactivated glass insert housed in 2-ml brown 
mass-spectrometry vials (Waters). A chemical standard solution (for 
quality control) was prepared froma synthetic complete mixture from 
Sigma-Aldrich (Y1501) at a concentration of 19 pg ml in 80% mass- 
spectrometry-grade methanol (Fisher Scientific). Metabolite extracts 
were analysed in a platform that consisted of a Waters UPLC-coupled 
Exactive Orbitrap mass spectrometer (Thermo), using an OPD2 HP-4B 
column (4.6 mm x 50 mm) and an OPD2HP-4A guard column (Shodex). 
The column temperature was maintained at 45 °C. In brief, 5 pl of each 
sample maintained at 4 °C was loaded by the autosampler in partial loop 
mode 3 times in the positive mode and 3 times in the negative mode. 


The binary mobile phase solvents were: A, 10 mM NH,OAc in 10:90 
acetonitrile:water; B, 1O mM NH,OAc in 90:10 acetonitrile:water. Both 
solvents were modified with 10 mM HOAc for positive-mode acquisi- 
tion or 10 mM NH, OH for negative mode. The 30-min gradient for both 
modes was set as: flow rate, 0.1 ml min”; 0-15 min, 99% A; 15-20.5 min, 
99% to 1% A; 20.5-25 min, 1% A; 25-25.5 min, 1% to 99% A; 25.5-30 min, 
99% A. The mass-spectrometry acquisition was in profile mode and per- 
formed with an electrospray ionization probe, operating with capillary 
temperature at 275 °C, sheath gas at 40 units, spray voltage at 3.5 kV for 
positive mode and 3.1kV for negative mode, capillary voltage at 30 V, 
tube lens voltage at 120 V and skimmer voltage at 20 V. The mass scan- 
ning used 100,000 mass resolution, high dynamic range for AGC target, 
500 ms as maximum inject time, and 75-1,200 m/z as the scan range. 
The system was operated by Thermo Xcalibur v.2.1 software. The raw 
data files generated from liquid chromatography-mass spectrometry 
were centroided with PAVA program“ and converted to mzXML format. 
Mass feature extraction was performed with XCMS v.1.30.3”. Differ- 
ential analysis was performed on signal intensity values derived from 
XCMS using the nonparametric Wilcoxon rank-sum test for positive 
and negative mode separately and adjusted for multiple hypothesis 
testing using g value correction using the R package q value (v.2.0.0). 
The mass features that were found significantly different were manu- 
ally searched against the Metlin metabolite database (29381867) using 
5 ppm mass accuracy. Retention time matching with compounds inthe 
standard mixture was also performed for a subset of the metabolite 
hits. Before PCA and hierarchical clustering analysis, signal intensity 
values derived from XCMS were range-scaled**. Pathway analysis was 
performed using the integrated pathway analysis tool in the Metabo- 
Analyst 3.0 software“, using all putatively identified metabolites that 
were found significantly different (FDR-adjusted P< 0.05, absolute fold 
change >1.5) together with all differentially expressed genes from the 
transcriptomic analysis (see above). 

Collectively, this metabolomics profiling uncovers changes in arginine 
and proline metabolism (Extended Data Fig. 2r-t and Supplementary 
Table 2m), which has been implicated in the regulation of inflammatory 
cytokines and extracellular matrix synthesis* *’ (Extended Data Fig. 2t), 
consistent with the characteristics of activated fibroblasts. 


Single-cell RNA-seq analysis of primary cultures of fibroblasts 

Toassess the cell composition and heterogeneity of primary fibroblast 
cultures, single cells were isolated from three young and three old 
fibroblast cultures at passage 3 (see Supplementary Table 3g). In brief, 
20 single cells per culture were isolated manually by picking isolated 
cells under a Zeiss inverted microscope (Zeiss AxioVision A10). Single- 
cell RNA-seq libraries were generated using SMARTer Ultra Low Input 
RNA Kit for Sequencing v.3 (Clontech, 634853), according to the manu- 
facturer’s instructions. Single cells were directly lysed in 2.5 pl of Clon- 
tech reaction buffer and the volume was brought up to 10 ul with sterile 
water. First-strand cDNA synthesis was carried out in 96-well PCR plates 
as follows: 1 pl of 3’ SMART CDS Primer II A (24 1M) was added and the 
resulting mix was incubated in a preheated thermocycler at 72 °C for 
3 min and then held at 4 °C. Next, 7.5 pl of first-strand master mix was 
added (SMARTScribe Reverse Transcriptase, 5x First-Strand Buffer, 
dNTP Mix and SMARTer IIA Oligonucleotides), mixed and incubated at 
42 °C for 90 min and 70 °C for 10 min. Finally, the first-strand cDNA was 
purified with SPRI Ampure XP beads; 36 pl of the SPRI beads was added 
to each 20-pl single-stranded cDNA sample, mixed and incubated for 
8 min at room temperature. The samples were placed on a Promega 
MagnaBot II magnetic separation device, the supernatant was discarded, 
and the single-stranded cDNA sample bound to the beads was directly 
used for double-stranded cDNA generation. Next, 50 pl of PCR master 
mix was added to each sample and mixed. Plates were placed in a pre- 
heated thermal cycler with a heated lid using the following program: 
95 °C for 1 min, 18 cycles of 95 °C for 30s, 65 °C for 30s, 68 °C for 6 min, 
followed by 72 °C for 10 min and hold on 4 °C. Amplified double-stranded 


cDNA was purified using SPRI Ampure Beads (Beckam Coulter), eluted in 
12 ul of purification buffer and kept in -20 °C. The quantity and quality 
of 1 pl of the amplified purified double-stranded cDNA were measured 
using the Agilent 2100 BioAnalyzer and Agilent’s High Sensitivity DNA 
Kit (Agilent, 5067-4626). Double-stranded cDNA libraries for which the 
BioAnalyzer results showed no contamination, a distinct peak at around 
2,000 bp and with approximately 2-7 ng of cDNA were selected. This 
resulted in 8-12 single-cell cDNA samples from each culture. To gener- 
ate RNA-seq libraries, we next used Nextera XT DNA Library Preparation 
kit and Nextera XT Index kit (Illumina, FC-131-1096 and FC-131-1002, 
respectively). In brief, 5 yl purified double-stranded cDNA (around 
1ng total) fromthe previous step was added into each sample well of a 
96-well plate, and 10 pl Tagmentation (TD) buffer was added into each 
sample and mixed gently. Next, 5 pl amplicon tagmentation mix was 
added tothe wells and mixed gently. The 96-well plate was sealed and 
placed in a thermal cycler and incubated at 55 °C for 5 min and held 
at 10 °C. The Tn5 transposase was inactivated by adding 5 pl of Neu- 
tralization buffer. The tagmented DNA was then amplified by adding 
15 pl of Nextera PCR Master Mix, 5 pl index 1 primers (i7) and 5 pl index 
2 primers (i5) to each sample. The final PCR was performed using the 
following program ona thermal cycler: 72 °C for 3 min, 95 °C for 30s, 
12 cycles of: 95 °C for 10 s, 55 °C for 30 s, 72 °C or 30s and 72 °C for 
10 min. The PCR products were then purified with Ampure beads. The 
final libraries were assessed using the Agilent 2100 BioAnalyzer and 
Agilent’s High Sensitivity DNA Kit. We generated three pooled libraries 
and sequenced them on three lanes of Illumina HiSeq 2000 paired-end 
2x101-bp sequencing reads. Quality and adaptor trimming of the Fastq 
files was performed using TrimGalore v.0.2.8, retaining reads witha 
minimum Phred score of 15. The trimmed reads were mapped to the 
mouse genome (mm9 build) using TopHat (v.2.0.8b). Reads per genes 
were counted using HTSeq (v.0.6.1). As annotation file, we used the 
genes.gtf downloaded from UCSC on 6 March 2013. On average, 7,000 
genes were expressed per cell. Genes with at least 10 reads in 3 single 
cells were considered expressed. Heat maps, hierarchical clustering 
and PCA were performed on VST values (implemented in DESeq2). 


t-SNE and PAGODA analysis of single-cell RNA-seq data from 
cultured cells 

To analyse the single-cell RNA-seq data, we performed t-SNE clustering 
using the Rtsne R package (v.0.14). Single-cell RNA-seq data were ana- 
lysed using PAGODA*’. PAGODA identifies pathways and sets of genes 
that are overdispersed in the data and separates the cells based on their 
expression patterns. We applied PAGODA to the raw counts of all genes 
that were considered to be expressed**. For gene sets, we used all KEGG 
pathways as wellas an ‘in vitro fibroblast ageing’ gene set that we defined 
from comparing the population RNA-seq data from young and old fibro- 
blast cultures (Supplementary Table 2b). In addition, we used the list 
of ‘fibroblast activation’ genes, which are genes that have previously 
been associated with fibroblast activation (Supplementary Table 2f). 
We used the PAGODA pipeline with default parameters, unless stated 
otherwise, and used the SCDE package v.0.99.1 in R v.3.2.2. PAGODA 
revealed a relatively strong cell clustering by KEGG cell cycle as well 
as two de novo gene sets (clusters 37 and 119; Extended Data Fig. 4b), 
consisting of many cell-cycle-related genes. We accounted for this 
cell cycle aspect of heterogeneity using the pagoda.subtract.aspect() 
method (see Supplementary Table 3h for the lists of genes in these gene 
sets). After accounting for cell cycle phases, PAGODA identified 74 KEGG 
pathways, 8 de novo gene sets and the in vitro fibroblast ageing and 
fibroblast activation signatures as significantly overdispersed in the 
dataset (Extended Data Fig. 4c). 


Immunofluorescence staining of fibroblast activation markers 
and EdU incorporation 

Immunofluorescence staining was performed as described in ‘Immu- 
nofluorescence staining of reprogramming factors and pluripotency 
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markers’. The following antibody was used for immunofluorescence: 
mouse anti-aSMA (Abcam, ab7817). The nuclei were stained with DAPI 
(Life Technologies). Cells were imaged using a Zeiss inverted microscope 
(Zeiss AxioVision A10) with AxioVision v.4.7.2 software. 

EdU (5-ethynyl-2’-deoxyuridine) incorporation in fibroblast cultures 
was visualized using the Click-iT EdU Plus Alexa Fluor 594 Imaging Kit 
(Invitrogen, C10639). Fibroblasts were plated onto glass coverslips (Bellco 
Glass, 194310012A) in wells of 24-well plates at a density of 20,000 cells 
per well. After allowing the cells to attach overnight, fibroblasts were 
incubated in medium containing EdU (10 1M) for 4 h. Cells were then 
fixed (4% paraformaldehyde in PBS) and permeabilized (0.1% Triton 
X-100 in PBS). EdU was detected by click reaction according to the 
manufacturer’s instructions. Cells were incubated in blocking buffer 
(2% BSA, 5% glycerol, 0.2% Tween-20, 0.1% sodium azide in MilliQ water) 
and stained with Alexa Fluor 488-conjugated anti-aSMA (Abcam, 
ab184675). Coverslips were mounted onto slides using ProLong Gold 
with DAPI (Invitrogen, P36931) and imaged ona Nikon Eclipse Ti/Andor 
CSU-WI1 spinning disk confocal microscope using Andor Zyla and NIS 
Elements AR software (v.4.30.02). 


Senescence in young and old fibroblast cultures 

We assessed senescence-associated B-galactosidase activity (SA-B-gal) 
in fibroblast cultures using a histochemical staining kit (Sigma-Aldrich, 
CS0030) according to the manufacturer’s recommendations. The nuclei 
were stained with DAPI (Life Technologies). For determining the propor- 
tion of senescent cells, 5-10 images were randomly taken per sample and 
uploaded on Image) (v.1.46r). Senescence rate was calculated by dividing 
the number of SA-B-gal* cells with the total number of cells (DAPI stain). 


FACS and analysis of primary fibroblasts 
We performed FACS analysis and sorting of THY1’PDGFRa’* and 
THYI PDGFRa‘ cells from primary fibroblast cultures at passage 3. FACS 
analysis was performed on an LSR II flow cytometer (BD Biosciences), 
and FACS sorting was performed onaBD FACS Aria II sorter, using a100 
pum nozzle. FACS data were analysed using FlowJo v.10.0.7. Gating was 
determined using fluorescence-minus-one controls for each colour 
used in each FACS experiment to ensure that positive populations were 
solely associated with the antibody for that specific marker (Extended 
Data Fig. 10). For FACS analysis of cultured cells, fibroblasts were stained 
with phycoerythrin-conjugated CD140a (BioLegend, 135905) and FITC 
(fluorescein isothiocyanate)-conjugated CD90.2 (BioLegend, 105305). 
EdU incorporation in fibroblast cultures was assessed by FACS using 
the Click-iT EdU Plus FACS PacBlue Kit (Invitrogen, C10636) in accord- 
ance with the manufacturer’s instructions. In brief, fibroblasts were 
incubated in medium containing EdU (10 pM) for 4 h. Cells were then 
dissociated and resuspended in FACS buffer (1% BSA in PBS). Cell surface 
markers were stained with phycoerythrin-conjugated CD140a (BioL 
egend, 135905) and FITC-conjugated CD90.2 (BioLegend, 105305). Cells 
were then fixed (4% paraformaldehyde in PBS) and permeabilized, fol- 
lowed by click reaction to detect EdU, according to the manufacturer’s 
instructions. 


RT-qPCR on cultured fibroblasts 

To assess expression of fibroblast subpopulation-specific genes, 
young and old fibroblast cultures at passage 3 were FACS-sorted into 
THY1*PDGFRo’® and THY1 PDGFRa’ fibroblasts (see above for details) 
and purified fibroblast subpopulations were expanded until passage 
5-9. THY1*PDGFRa* and THY1 PDGFRa‘’ cells were then plated ata 
density of 50,000 cells per well in a well of a 6-well plate. After 4 days, 
RNA was isolated from these fibroblast cultures and cDNA synthesis 
was performed as described in‘RT-qPCR on iPS cells and differentiated 
cells from embryoid bodies’. Comparisons were made between pairs 
from the same original culture. Hprtl was used as housekeeping gene 
for normalization. All primer sequences are listed in Supplementary 
Table 6a. 


Knockdown of the transcription factor EBF2 

To test the functional implication of specific transcriptional regula- 
tors, we performed shRNA knockdown experiments. FACS-purified 
THY1'PDGFRa* and THY1 PDGFRa‘’ fibroblasts at passages 4-9 were 
plated at a density of 50,000 cells per well in a 6-well plate. One day 
after plating, cells were infected by lentiviruses expressing shRNAs. Two 
independent lentiviral shRNA vectors against Ebf2 were used (Sigma- 
Aldrich, TRCNO000081515 and TRCNO000081514). As control, a lentivi- 
ral shRNA vector against luciferase was used (Sigma-Aldrich, SHCOO7V). 
To produce lentiviruses, we followed the protocol described above (see 
‘Lentiviral production for reprogramming’). Viral supernatant was col- 
lected at 24 h after transfection, centrifuged at 3,000 r.p.m. for 15 min 
and transferred into a fresh tube. Next, 0.7 ml of the crude virus super- 
natant was added to THY1"PDGFRa’ and THY1 PDGFRa‘ fibroblasts. The 
medium was changed 24 h after infection, and the cells were maintained 
in fibroblast growth medium for another 48 h before RNA collection. RNA 
collection and purification, and RT-qPCR, were performed as described 
above (see ‘RT-qPCR on cultured fibroblasts’). 


Overexpression of the transcription factor EBF2 

Fibroblasts from young mice at passage 3 were plated at a density of 
20,000 cells per well ina 12-well plate. One day after plating, cells were 
infected by lentiviruses expressing Ebf2 or a vector control. To produce 
lentiviruses (see ‘Lentiviral production for reprogramming’), the fol- 
lowing vectors were used: 20 pg of pLenti-Ebf2-Myc-DDK (OriGene, 
MR224591L3) or 20 pg of pLenti-C-Myc-DDK (OriGene, PS100064), 
12.6 pg of psPAX2 (Addgene 12260) and 3.7 pg of VSV-G. After 6 h of 
transfection, the medium was replaced by 7 ml fresh growth medium. 
At 24 and 48 h after transfection, viral supernatants were collected and 
centrifuged at 3,000 r.p.m. for 15 min and subsequently transferred 
into fresh tubes. Viral supernatant, collected from two 10-cm dishes of 
HEK293T cells at both time points, was concentrated by centrifugation at 
16,500 r.p.m. for 1.5 hat 4 °C. The pellet was then resuspended in 2.5 ml 
fibroblast growth medium with polybrene (8 pg mI, MilliporeSigma, 
TR1003G). Next, 0.35 ml of the concentrated virus was added to each well 
of fibroblasts. The medium was changed 24 h after infection, and the cells 
were maintained in fibroblast growth medium for another 24 h before 
RNA collection. RNA collection and purification, and RT-qPCR, were 
performed as described above (see ‘RT-qPCR on cultured fibroblasts’). 


Proliferation rate of young and old fibroblast cultures 

Proliferation rate was assessed by plating young and old fibroblasts at 
a density of 50,000 cells per well of a 6-well plate in fibroblast growth 
medium. Every second day for up to 6 days, independent cultures were 
trypsinized and the number of cells in the cell suspension was counted 
manually using ahaemocytometer. A growth slope was determined as 
the slope of the regression line based on the data points (cell numbers). 


FACS and analysis of fibroblasts in vivo in tissues 

We isolated fibroblasts from the ears of young and old mice for FACS, 
quantification and transcriptomic analysis. In brief, ears were dissected 
from animals, cut into small fragments (around 1 mm’) and digested 
in DMEM (Invitrogen, 11965-092) supplemented with 0.14 Wunsch 
units ml” of Liberase DL (Roche, 5401160001) for 30 min at 37 °C. The 
fragments were washed with DMEM supplemented with 20% FBS (Gibco, 
16000-044), funnelled through a100-~m nylon mesh (Fisher Scientific, 
08-771-19) and washed with fibroblast growth medium (DMEM supple- 
mented with 10% FBS and 1% PSQ). A second filtering was performed 
using a 40-um nylon mesh (Fisher Scientific, 08-771-1), followed by a 
washing step with fibroblast growth medium. Finally, cells were washed 
with FACS buffer (PBS, 1% BSA, 500 nM EDTA) and resuspended in FACS 
buffer to be stained for FACS analysis. FACS analysis and sorting was 
performed ona BD FACS Aria II sorter, using a 100-m nozzle. FACS 
data were analysed using FlowJo v.10.0.7. Gating was determined using 


fluorescence-minus-one controls for each colour used in each FACS 
experiment to ensure that positive populations were solely associated 
with the antibody for that specific marker (Extended Data Fig. 10). For 
in vivo FACS analysis and sorting the following antibodies were used: 
CD140a (BioLegend, 135905), CD90.2 (BioLegend, 105305), TER119 
(Biolegend, 116234), CD326 (Thermo Fisher Scientific, 50-163-76), CD45 
(Biolegend, 103126), CD31 (Biolegend, 102422), CD202b (Thermo Fisher 
Scientific, 15-5987-82), brilliant violet 421 streptavidin (Biolegend, 
405226) and DAPI staining solution (Thermo Fisher Scientific, 62248). 


Bulk RNA-seq of THYI PDGFRa’ and THY1°PDGFRa‘ cells from 
the ears of young and old mice, before and after wounding 

To determine whether cells could express cytokines in vivo, we 
profiled changes in transcriptomic features in fibroblast subpopu- 
lations in tissues from young and old mice, before and after wound- 
ing (see ‘Wounding and wound healing experiments’ for details on 
wounding experiments). RNA-seq was performed on freshly isolated 
THYI'PDGFRa‘ Lin’ or THY! PDGFRa‘’‘ Lin’ (defined as PDGFRa*CD45 
“CD31 EpCAM TERI19 TIE2 ) (see above for isolation). Cells from 2-3 
young or old mice were pooled together to obtain 500 cells of each 
population for each biological replicate. RNA isolation and generation 
of RNA-seq libraries were performed using the Clontech SmartSeq v.4 
Ultra-Low Input RNA kit (Clontech). Cells were FACS-sorted directly 
into lysis buffer and cDNA was prepared as described by the manufac- 
turer. Each cDNA library was analysed using a High Sensitivity chip on 
an Agilent 2100 Bioanalyzer. To generate sequencing libraries, 0.15 ng 
of eachcDNA library was used as input for the Nextera XT kit, following 
the manufacturer’s recommendations. Cells were indexed using the 
Nextera XT Index kit v.2 set A, and were subsequently multiplexed and 
sequenced on Illumina NextSeq-500 High Output Flow Cell (400 M), 
using 75-bp paired-end reads. 


Assessment of reprogramming efficiency of THY1’PDGFRa’ and 
THYI PDGFRa‘’ fibroblasts 

To determine the intrinsic reprogramming efficiency of THY1’PDGFRa* 
and THY1 PDGFRa‘’ fibroblasts, cells were FACS-purified (see above) 
and plated at a density of 100,000 cells per well of a 6-well plate. Repro- 
gramming was induced as described above (see ‘Reprogramming of 
young and old fibroblasts to iPS cells and characterization of the iPS 
cells’). Inthese experiments, 0.1% gelatin-coated plates (Tribec Science, 
TBS8004) without feeders were used to avoid confounding factors 
from the feeder cells. Fibroblasts infected with lentiviruses express- 
ing the OSKM factors were maintained on fibroblast growth medium 
until day 7 after replating and then switched to ES cell medium until 
days 12-13. Reprogramming efficiency was assessed by AP staining as 
described above (see ‘Assessment of reprogramming efficiency’). This 
analysis revealed that activated THY1*PDGFRa‘’ fibroblasts intrinsically 
reprogram less efficiently compared to THY! PDGFRa’ non-activated 
fibroblasts, and that old non-activated fibroblasts (THY1 PDGFRa‘’) also 
reprogrammed less efficiently than young THY1 PDGFRa‘’ fibroblasts 
(Extended Data Fig. 5q). 


Assessment of reprogramming efficiency of fibroblasts with 
swapped conditioned medium 

To assess the contribution of extrinsic factors for reprogramming effi- 
ciency, we performed experiments swapping conditioned medium. For 
the conditioned medium experiments THY1°PDGFRa’ or THY] PDGFRa*® 
fibroblasts (passages 5-9), or good or bad old fibroblasts (passage 3) 
were plated at a density of 350-400,000 cells per 10-cm plate or at a den- 
sity of 0.5-1 million cells per 15-cm plate in fibroblast growth medium. 
In parallel, cells were plated at a density of 100,000 cells per well of a 
6-well plate to induce reprogramming as described above (see ‘Repro- 
gramming of young and old fibroblasts to iPS cells and characterization 
of the iPS cells’). Starting from day 1 after infection with OSKM factors, 
conditioned medium was collected from 10-cm or 15-cm dishes from 


the indicated cultures that were growing in parallel, centrifuged at 
10,000 r.c.f. for 10 min at room temperature and added every day to 
the recipient cells by replacing the medium. From day 7 after replating 
onwards, ES cell medium was made using the conditioned medium from 
fibroblast cultures as a base. Owing to the positive effect of conditioned 
medium on cellular reprogramming, reprogramming efficiency was 
assessed earlier than in other experiments, at days 9-10 after infection. 
Reprogramming efficiency was assessed by AP staining as described 
above (see ‘Assessment of reprogramming efficiency’). For experi- 
ments using THY1*PDGFRa’ or THY] PDGFRa’ fibroblasts, comparisons 
were made between pairs of THY! PDGFRa’ and THY1*PDGFRa’ from 
the same original culture. For experiments using conditioned medium 
from THY1*PDGFRa’* or THYI PDGFRo’ fibroblasts, conditioned 
medium was collected from the THY] PDGFRa® or THY1’PDGFRa* 
fibroblasts from the same original culture, and comparisons were 
made between the effect of the different conditioned media on the 
specific populations. 


Non-viral reprogramming 

To test whether variation in reprogramming efficiency could be owing 
to lentiviral infection, we used a non-viral reprogramming protocol. 
Non-viral reprogramming was induced using the piggyback trans- 
poson system containing the OSKM factors”. In brief, FACS-purified 
THY1*PDGFRa’ or THY! PDGFRa‘ cells (passages 4-6) were platedina 
well of a 6-well plate at a density of 100,000 cells per well and transfected 
by the piggyback transposon vector using Lipofectamine 3000 (Life 
Technologies, 11668027), according to the manufacturer’s instructions. 
Transfected fibroblasts were maintained on fibroblast growth medium 
until day 7, and then switched to ES cell medium until days 16-19. Repro- 
gramming efficiency was calculated by counting the number of AP* 
colonies in the well. 


Effect of cytokines and blocking antibodies on reprogramming 
efficiency 

To test how cytokines impact reprogramming efficiency, the follow- 
ing recombinant cytokines and blocking antibodies were purchased 
from R&D systems and used according to the manufacturer’s recom- 
mendations: recombinant mouse IL-6 (R&D systems, 406-ML-025), 
recombinant mouse IL-1 (R&D systems, 401-ML-025), recombinant 
mouse IL-4 (R&D systems, 404-ML-050), recombinant mouse TNF 
(R&D systems, 410-NT-050), recombinant mouse VEGF (R&D sys- 
tems, 493-MV-025), and normal polyclonal goat IgG (R&D systems, 
AB-108-C), goat polyclonal mouse anti-IL-6 blocking antibody (R&D 
systems, AB-406-NA) and goat polyclonal mouse anti-TNF blocking 
antibody (R&D systems, AB-410-NA). For all experiments, the recom- 
binant cytokines were resuspended according to the manufacturer’s 
instructions and used in culture medium at a final concentration 
of 10 ng mI”. For the blocking antibody experiments, the blocking 
antibodies were pre-incubated with the corresponding cytokine or 
conditioned medium for 60-90 min before treatment. The blocking 
antibodies (or control IgG) were used at a concentration of 8 pg mI. 
For these experiments, young and old fibroblasts at passage 3 were 
plated at a density of 100,000 cells per well of a 6-well plate. To avoid 
confounding factors from the feeder cells, cells were plated on 0.1% 
gelatin-coated plates (Tribec Science, TBS8004). Reprogramming was 
induced as described above (see ‘Reprogramming of young and old 
fibroblasts to iPS cells and characterization of the iPS cells’). Starting 
from day 1 after infection with OSKM factors, cells were treated with 
specific cytokines or with conditioned medium together with blocking 
antibodies. Reprogramming efficiency was calculated by counting the 
number of AP* or SSEAI‘ colonies in the wells as described above (see 
‘Assessment of reprogramming efficiency’). For the cytokine experi- 
ments, comparisons were made between treated and untreated cells 
originating from the same infected pool of cells and thus infection 
efficiency was not taken into account. 
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Western blot analyses 

To test whether the cytokines used in this study induce their cognate 
pathways in fibroblasts, we performed western blot analyses. Young and 
old fibroblasts at passage 3 were plated at a density of 100,000-150,000 
cells ina 6-cm dish in fibroblast growth medium. After plating for 24h, 
cells were treated with the indicated cytokines or antibodies for 30 min. 
Cells were then lysed directly in the culture plates using ice-cold RIPA 
buffer (SO mM Tris-HCL pH 7.5, 150 mM NaCl, 2mM EDTA, 1% NP-40, 
0.1% SDS supplemented with 1 mM aprotinin, PMSF and PhosphoStop 
(Pierce)), scraped and transferred to Eppendorf tubes. Following addi- 
tion of sample buffer (0.0945 M Tris-HCI pH 6.8, 9.43% glycerol, 2.36% 
w/v SDS and 5% B-mercaptomethanol), samples were resolved on 10% 
SDS-PAGE gels, transferred onto nitrocellulose membranes and blot- 
ted, according to the following protocol”, using the following anti- 
bodies: phosphorylated STAT3 (Tyr705) (Cell Signaling Technology, 
9145), STAT3 (Invitrogen, 44-364G), phosphorylated STAT6 (Tyr641) 
(Cell Signaling Technology, 9361), STAT6 (Cell Signaling Technology, 
5397), phosphorylated AKT (Ser473) (Cell Signaling Technology, 4060), 
AKT (Cell Signaling Technology, 4691), phosphorylated NF-KB (Ser536) 
(Cell Signaling Technology, 3033), NFB (Cell Signaling Technology, 
8242), phosphorylated JNK1 and JNK2 (Thr183 and Tyr185) (Invitrogen, 
44-682G), JNK1 (Invitrogen, 44-690G) and B-actin (Novus Biologicals, 
NB600-501). Membranes were incubated with HRP-conjugated anti- 
mouse (Calbiochem, 401215) or anti-rabbit secondary (Calbiochem, 
401393) antibodies and visualized using enhanced chemiluminescence 
detection reagent (Amersham ECL, GE Healthcare). 


Wounding and wound healing experiments 

To assess the change in the wound healing ability of mice with ageing, 
young (3-4 months) and old (24-26 months) C57BL/6 male mice from 
the NIA colony were anaesthetized in standard fashion by inhalation of 
1-4% of isoflurane™. The hair on the dorsal aspect of both ears was shaved 
and cleaned with a 70% ethanol solution. Symmetric full-thickness skin 
wounds were induced on both ears by first gently pressing a4-mm punch 
biopsy onto the dorsum of the ear at its cartilaginous base. Sharp scis- 
sors were then used to dissect away the wheel of skin while leaving the 
underlying connective tissue, cartilage and anterior skin. No dressing 
was applied post-operatively, and the wounds were allowed to heal 
without further intervention. Wound healing was assessed by imaging 
(using a standard iPhone 8S camera) every other day for 20 days. Wound 
closure was analysed by comparing the relative wound size at a given 
time to the original size immediately after the operation, performed 
as previously described™. A wound was considered closed when it was 
re-epithelialized for more than 95% of its original size. The rate of indi- 
vidual wound healing was determined using the average of the resultant 
measurements from both ears per mouse. FACS analysis assessing the 
percentage of activated THYI*PDGFRa’Lin’ cells in ears of young and 
old mice, before and after wounding, revealed that the fibroblasts in 
the wounds were predominantly activated (THY1’PDGFRa’) fibroblasts 
(Extended Data Fig. 7d). In line with this finding, all three populations 
identified in the single-cell RNA-seq analysis of all live cells in the old 
wounds exhibited enrichment for different aspects of the activated 
fibroblast state (Extended Data Fig. 9e). 


Single-cell RNA-seq of fibroblasts from young versus old wounds 
using 10x Genomics Chromium 

To evaluate changes in the fibroblast composition of wounds with age, 
we performed single-cell RNA-seq of all live PDGFRa’Lin’ cells in the 
wounded area from young and old mice, 7 days after wounding. We 
pooled cells from 10 young (3-4 months) or 10 old (24-26 months) male 
C57BL/6 mice from the NIA aged colony. FACS sorting was performed 
as described above. Cells were sorted into chilled fibroblast growth 
medium. Cells were then spun down at 300g for 5 min at 4 °C and resus- 
pended in fibroblast growth medium at a concentration of 263 cells per 


pl. Young and old cells were loaded onto a10x Genomics Chromium chip 
as per the manufacturer’s recommendations. Reverse transcription and 
library preparation was performed using the 10x Genomics Single Cell 
v.2 kit following the 10x Genomics protocol. One library from young 
mice and one library from old mice were multiplexed and sequenced 
on one lane of Illumina NextSeq-500 High Output Flow Cell (400 M), 
using 75-bp paired-end reads. 


Single-cell RNA-seq of the entire wounds of fast- versus slow- 
healing old mice using 10x Genomics Chromium 

To determine the differences in the composition of cells from old mice 
with different wound healing trajectories, we performed single-cell 
RNA-seq of all live cells in the entire wounds of two old mice with slow- 
healing trajectories and two old mice with fast-healing trajectories, 
7 days after wounding. Mice were sedated and mice were perfused with 
20 ml of PBS with heparin sodium salt (50 U mI) (Sigma Aldrich) to 
remove the blood, and ears were immediately harvested. Wounds were 
dissected and processed as described above. Live/dead staining was 
performed using 1 pg mI“ propidium iodide (Biolegend). FACS sorting 
was performed ona BD FACS Aria Fusion sorter using a100-ym nozzle. 
Cells were sorted into chilled fibroblast growth medium. Cells were 
then spun down at 300g for 5 min at 4 °C and resuspended in fibroblast 
growth medium at a concentration of 1,000-1,500 cells per pl. Cells 
were loaded onto a10x Genomics Chromium chip as described above. 
Twolibraries from 2 fast old mice and two libraries from 2 slow old mice 
were multiplexed and sequenced on one lane of Illumina Novaseq 6000 
S2, using 101bp paired-end reads. 


Quality control of 10x Genomics single-cell RNA-seq 

For mapping, sequences obtained from sequencing using the 10x 
Genomics single-cell RNA-seq platform were de-multiplexed using 
the Cell Ranger package from 10x Genomics and mapped to the mm10 
transcriptome using the Cell Ranger package (10x Genomics). Cells were 
removed from subsequent analysis if they were expressing fewer than 
500 unique genes or expressed more than 10% mitochondrial reads. Lev- 
els of mitochondrial reads and numbers of Unique molecular identifiers 
were similar between the young and old mice (Extended Data Fig. 8a) and 
between the old mice with different wound healing capacities (Extended 
Data Fig. 8i), indicating that there was no systematic bias in the libraries 
between the conditions tested. Average gene detection in each library 
was also similar between the conditions tested (Extended Data Fig. 8a, i). 
Our study includes 13,833 total cells, with 3,036 PDGFRa‘Lin cells from 
wounds of pooled young and old mice (1,592 young cells and 1,444 old 
cells) and 10,797 cells from individual wounds from old mice with dif- 
ferent wound healing capacities (fast old 1, 2,533 cells; fast old 2, 2,376 
cells; slow old 1, 3,761 cells; slow old 2, 2,127 cells). 


t-SNE analysis of single-cell RNA-seq datasets and identification 
of cell clusters 

To analyse the single-cell RNA-seq data, we performed t-SNE clustering 
using the Seurat R Package (v.2.3.4) with the first 30 principal compo- 
nents®, Identification of significant clusters was performed using the 
FindClusters() algorithm in the Seurat package, which uses a shared 
nearest neighbour (SNN) modularity optimization-based clustering 
algorithm*?™*. Marker genes for each significant cluster were found 
using the Seurat function FindAllMarkers(). This analysis identified two 
main clusters of fibroblasts between young and old wounds and seven 
main clusters of cells between the old mice with different wound heal- 
ing trajectories (Fig. 4b, cand Extended Data Fig. 8b, j). Cell types were 
determined using a combination of marker genes identified from the 
literature, PAGODA analysis and GO for cell types using the web-based 
tool Enrichr (http://amp.pharm.mssm.edu/Enrichr/). We note that some 
known components of the skin (for example, keratinocytes and epithelial 
cells) were notidentified in these wounds, similar to a recent single-cell 
RNA-seq study on dorsal skin after wounding in young animals“. This 


could be owing to wound composition, dissociation properties and 
survival during the FACS sorting protocol, as previous single-cell RNA- 
seq studies that identified epithelial cells in skin, have either specifically 
isolated epithelial cells using FACS* or used a different isolation protocol 


on unwounded skin®**®. 


PAGODA on single-cell RNA-seq data from wounds of young and 
old mice or of old mice with different wound healing trajectories 
We performed PAGODA analyses using raw counts for all genes that were 
considered expressed for analyses of individual datasets (Fig. 4d and 
Extended Data Figs. 8c, 9d) and using Seurat normalized counts for the 
combined analysis (Extended Data Fig. 9k). We performed three sepa- 
rate analyses: (1) young compared to old PDGFRa‘ Lin cells (Extended 
Data Fig. 8c); (2) cells identified as fibroblasts by Seurat from wounds of 
fast- compared to slow-healing old mice (Extended Data Fig. 9d); and (3) 
combined fibroblasts from both datasets (all PDGFRa’Lin cells together 
with the cell cluster identified as fibroblasts by Seurat) (Extended Data 
Fig. 9k). For gene sets, we used all KEGG pathways as well as the in vitro 
fibroblast ageing and fibroblast activation genes sets described above 
(see Supplementary Table 2b, f). We used the PAGODA pipeline with 
default parameters, unless stated otherwise, and used the SCDE package 
v.1.99.1in R v.3.3. We did not account for cell cycle in these analyses. We 
noted that fibroblast subpopulation B did not contain cells from old/ 
young in the combined analysis. This is probably owing to the fact that 
this subpopulation of fibroblast has some markers of the haematopoi- 
etic lineage (Extended Data Fig. 9d, k), and is probably depleted of the 
PFGDRa’‘Lin’ FACS-sorting technique that we used to isolate fibroblasts 
from the wounds of young and old mice. 


Violin plots for gene expression of single cells 
To visualize the expression of individual genes, cells were grouped by cell 
type (as determined by PAGODA). The log-transformed and normalized 
gene expression values as calculated by Seurat were plotted for each 
cell as a violin plot with an overlying dot plot inR. 


Statistical analysis 

For most experiments, young and old mice or samples were processed in 
analternate manner rather than in two large groups, to minimize group 
effects. Although we did not do a bona fide power analysis, we took 
into account previous experiments to estimate the number of animals 
needed in each experiment. The exception is the wound healing experi- 
ment in Fig. 4, in which a power analysis was performed based on aninitial 
experiment to determine the sample size required to detect a difference 
in variability with a 95% confidence interval. For all quantifications that 
were done with FACS or automated image quantification, no blinding 
was performed, including Fig. 3a, b and Extended Data Figs. 5b, 7d. 
The other experiments were blinded, with the exception of Fig. 1c 
and Extended Data Figs. 1l, m, 5e, 6e-h, k. Statistical analysis of the 
differences between age groups was performed using a unpaired 
two-tailed nonparametric Wilcoxon rank-sum test or a paired two- 
tailed or a one-tailed nonparametric Wilcoxon signed-rank test, unless 
otherwise stated. The statistical test applied was determined before 
performing the experiments. In cases in which the same recipient 
fibroblast culture was used in two independent experiments (Extended 
Data Figs. 5d, q, u, 6g), an average of the resultant measurements was 
determined. The nonparametric Fligner—Killeen test was used to test 
for differences in variance in reprogramming efficiency. Pvalues were 
corrected for multiple hypothesis testing using Benjamini-Hochberg 
correction, unless otherwise stated, and were considered significant 
when P< 0.05. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All raw sequencing reads for population RNA-seq, ChIP-seq and single- 
cell RNA-seq data can be found under BioProject PRJNA316110. The 
command and configuration files, in addition to a list of all versioned 
dependencies present in the running environment, are available on 
the Github repository for this paper (https://github.com/brunetlab/ 
Mahmoudi_et_al_2018) (except for the code for the processing of metab- 
olomics data, which is available upon request). 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Primary old fibroblasts from mouse ear, mouse lungs 
and human skin secrete high levels of inflammatory cytokines, and the 
ability of individual cultures from ear fibroblasts to reprogram is 
stereotypical. a, Cytokine profiling of plasma from young (3 months, n=21) and 
old (29 months, n=19) mice using Luminex multiplex cytokine assay 

(2 independent experiments). Box-and-whisker plot of log,-transformed fold 
change in MFI over median of young fibroblasts. Box plots depict the median 
and interquartile range, with whiskers indicating minimum and maximum 
values. *P< 0.05, **P< 0.01, ***P< 0.001, n.s., not significant; two-tailed Wilcoxon 
rank-sum test with Benjamini-Hochberg correction. Exact Pvalues are in 
Supplementary Table 1a. b, Blood cell composition analysis of plasma from 
young (3 months, n=9) and old (29 months, n= 9) mice using the Hemavet 
Multispecies Hematology Analyzer (2 independent experiments). Data are cell 
numbers per pl in whole-blood samples. Each dot represents cells from one 
mouse. Lines depict median. P values, two-tailed Wilcoxon rank-sum test with 
Benjamini-Hochberg correction. c, Percentage of fibroblasts (PDGFRa’) and 
immune cells in young (3 months, n=8) and old (29 months, n=8) fibroblast 
cultures at passage 3 (1 experiment), as determined by FACS using the indicated 
cell-type-specific surface markers. Dataare mean+s.e.m. Primary splenocytes 
from an 8-week-old mouse were used asa positive control (right). d, Comparison 
between cytokine profiles of plasma (red triangles) and conditioned medium 
from cultured ear fibroblasts at passage 3 (orange squares). Results are the mean 
log,-transformed concentrations (pg pl”) of cytokines detected. For exact 
concentrations, see Supplementary Table 1a, b. n.d., not detected. e, Cytokine 
profiles of conditioned medium from primary cultures (passage 3) of lung 
fibroblasts from young (3 months, n=8) and old (20-24 months, n=9) mice 

(2 independent experiments). Box-and-whisker plot of log,-transformed fold 
change in MFI over median of young fibroblasts. *P< 0.05, **P< 0.01; two-tailed 
Wilcoxon rank-sum test with Benjamini-Hochberg correction. Box plots as ina. 
Exact Pvalues are in Supplementary Table Ic. f, Cytokine profiles of conditioned 
medium from primary fibroblast cultures isolated from punch biopsy of pre- 
auricular skin of healthy human subjects of different ages. Results are shown as 
Spearman’s rank correlation coefficient (p) between donor age (years) and 
cytokine levels (MFI) inhuman fibroblast cultures (n= 8) at passage 3 (1 
experiment). Each dot represents cells from one individual. P values, two-sided 
algorithm AS 89 inR. For multiple hypothesis testing, see Supplementary 

Table 1d. g, Cytokine profiles of conditioned medium collected from passage 23 
cultures of iPS cell lines derived from young (3 months, n= 4) and old (29 months, 
n= 6) fibroblasts (1 experiment). Box-and-whisker plot of log,-transformed fold 
change in MFI over median of young iPS cells. Only cytokines that were detected 
at significantly different levels in young and old fibroblasts are shown (fora 
complete cytokine list and more details, see Supplementary Table le, f). Each dot 
represents an individual iPS cell line. Pvalues, one-tailed Wilcoxon rank-sum test 
with Benjamini-Hochberg correction. Exact Pvalues are in Supplementary 
Table le. Box plots as ina. h, Comparison of age-dependent changes in cytokine 
levels between plasma-incubated (described in a) and conditioned-medium- 
incubated mouse fibroblasts (described in Fig. 1b), their derived iPS cells 
(described in g) and from human fibroblasts (described inf) (Supplementary 
Table 1a, b, d, e) based on cytokines that are significantly different in conditioned 
medium from fibroblasts (Fig. 1b, bottom). Top (also presented in Fig. 1b), 


ranked fold change (old/young) in levels of the indicated cytokines in plasma, 
conditioned medium from mouse primary fibroblasts and iPS cells. Bottom, 
ranked Spearman p correlations for the indicated cytokines in conditioned 
medium from human primary fibroblasts (see f for individual p values). i,j, iPS 
cell lines derived from young and old mice show typical morphologies of mouse 
iPS cells, express similar levels of pluripotency markers and can give rise to cell 
types from all three germ layers upon embryoid body formation. i, 
Representative immunofluorescence images of iPS cell lines derived from 
young and old mice at passage 23, stained with the indicated antibodies 
(lexperiment).j, RT-qPCR on the indicated genes in embryoid bodies 
differentiated in vitro from iPS cell lines from young (n=5) and old (n=8) mice at 
passage 23 (1experiment). Expression is presented as expression relative to the 
housekeeping gene Hprt1. Each bar represents one iPS cell line. k, Cytokine 
profiles of conditioned medium collected from cultures of young (n=7) and old 
(n=7) ear fibroblasts at passage 33 (1 experiment). Box-and-whisker plot of log,- 
transformed fold change in MFI over median of young fibroblasts. Only the 
cytokines that exhibited a significant difference in expression levels in 
conditioned medium from young and old fibroblasts at passage 3 are shown. 
Pvalues, one-tailed Wilcoxon rank-sum test with Benjamini-Hochberg 
correction. Foracomplete cytokine list and exact Pvalues, see Supplementary 
Table 2r. Box plots asin a. Note that the experiments in fibroblasts at passage 3 
and 33 were conducted independently, and therefore statistical comparisons 
were restricted to within experiments. However, a direct comparison between 
the levels of secreted factors at passage 3 to 33 revealed that the levels of most 
cytokines decrease upon passaging. I, Reprogramming efficiency, assessed by 
SSEAI1 staining, of young (n=14) and old (n=24) ear fibroblast cultures at 
passage 3 (3 independent experiments), as log,-transformed fold change over 
the median of young mice. Each dot represents a fibroblast culture from one 
mouse. Pvalue, Fligner-Killeen test to assess differences in variance between 
age groups. m, Reprogramming efficiency assessed by AP staining of young 

(3 months, n=7), middle-aged (12-13 months, n=7) and old (28-30 months, 
n=8) chest fibroblast cultures at passage 3 (2 independent experiments), as log,- 
transformed fold change over the median of young mice. Dots as inl. Pvalues, 
Fligner-Killeen test to assess differences in variance between age groups with 
Benjamini-Hochberg correction. n, Reprogramming efficiency of fibroblast 
cultures are mainly stereotypical to fibroblast cultures from an individual 
mouse. Correlation plot depicting the reprogramming efficiency (assessed by 
AP staining) of fibroblast cultures reprogrammed in two experiments 
(separated by more than one month), with data from experiment 1onthex axis 
and data from experiment 2 on they axis. Data shown are from young (3 months, 
n=14), middle-aged (12 months, n= 6) and old (29 months, n=18) mice. Dots asin 
I. Pvalues, two-sided algorithm AS 89 in R. There was a positive correlation 
(Spearman rank correlation, p = 0.63, P=2.1x 10°) between reprogramming 
efficiencies of fibroblast cultures from the same mouse. 0, Correlation plot 
depicting the ability of young (n= 4) and old (n=5) fibroblast cultures at passage 
3to reprogram into neurons (iN, assessed by PSA-NCAM) or to iPS cells (iPSC, 
assessed by AP) (lexperiment). Dots asinI. Pvalues, two-sided algorithm AS 89 
in R. There was a significant positive correlation (Spearman rank correlation, 
p=0.84, P=0.003) between these two features. For individual experiments in 
a,b,e,I-n, see Supplementary Table 7. 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2| Old fibroblasts exhibit distinct transcriptomic, 
epigenomic and metabolomics profiles compared to young fibroblasts. 

a, b, PCA of H3K4me3 (a) and H3K27me3 (b) peak intensities from young ear 
fibroblast cultures (3 months, n=2), old cultures that reprogram well (good old, 
29 months, n=2) and old cultures that reprogram poorly (bad old, 29 months, 
n=1) (experiment). Principal components (PCs) land 2 are shown. c, PCA of 
metabolomes of young (n= 8), good old (n= 4) and bad old (n=4) ear fibroblast 
cultures (1 experiment). Principal components 1and 2 are shown. d, PCA of 
metabolomes of young (n= 8), good old (n= 4) and bad old (n=4) ear fibroblast 
cultures (1 experiment). Principal components 1 and 3 are shown.e, 
Unsupervised hierarchical clustering of transcriptomes (RNA-seq) of young 
(n=8), good old (n=5) and bad old (n=5) ear fibroblast cultures (3 independent 
experiments). Hierarchical clustering was performed using correlation-based 
dissimilarity (Pearson’s) as distance measure and average for linkage analysis. 
The yaxis indicates the similarity between samples. b-e, Mice were the same 
ages as ina. f-h, Unsupervised hierarchical clustering of H3K4me3 (f) and 
H3K27me3 (g) peaks and metabolomes (h) described ina, bandc, respectively. 
Hierarchical clustering was performedas ine. i, PCA of transcriptomes (RNA- 
seq) of good and bad old ear fibroblast cultures. Principal components land 2 
are shown.j, PCA of metabolomes of good old (n= 4) and bad old (n=4) ear 
fibroblast cultures. Principal components 2 and 3 are shown. k, Selected GO 
terms enriched in the old transcriptomes (young, n=8; old, n=10), with 
corresponding FDR-adjusted Pvalues (one-sided Fisher’s exact test with 
Benjamini-Hochberg correction). For acomplete list, see Supplementary 
Table 2d.1, Heat map showing expression (VST-transformed read counts, scaled 
row-wise) of selected cytokine genes. The scale for expression fold changes is 
indicated onthe left. m, Gene set enrichment analysis (GSEA) plot depicting the 
transcriptomes of young (n= 8) and old (n=10) fibroblasts. Top, genes 
associated with fibroblast activation (see Supplementary Table 2f) are enriched 
inthe old fibroblasts (P< 1.0 x10, two-sided nominal Pvalue). Bottom, heat 
map shows expression of fibroblast activation genes (VST-transformed read 
counts, scaled row-wise). The scale for expression fold changes is indicated on 
the left. n—q, Analysis of the epigenomic data as described ina, b. Inline with the 
transcriptomic data, age-dependent changes in epigenomic landscape also 
revealed enrichment of pathways involved in fibroblast activation, suchas 
cytokines, extracellular matrix components and contractility-related features. 
n, Left, heat map shows the H3K4me3 peaks within promoter regions that 
exhibit a significant difference in intensity with age, assessed by Diffbind. Peak 
intensity is shown as VST-transformed read counts, scaled row-wise. The scale 
for peak intensity fold changes is indicated on the left. Right, selected enriched 
KEGG pathways colour coded according to significance (one-sided Fisher’s exact 
test with Benjamini-Hochberg correction; black, FDR-adjusted P< 0.05; grey, 
FDR-adjusted P< 0.15). Fora complete list of significant KEGG terms, see 
Supplementary Table 2h. o, Top Venn diagram depicts the overlap of bivalent 
domains within promoters regions of young and old fibroblasts. For details, see 
‘Chromatin immunoprecipitation followed by sequencing and analysis of the 
epigenomic landscape’. Middle, pie charts show how the unique bivalent 
domains in young fibroblasts change in old fibroblasts (left pie chart) and vice 
versa (right pie chart). Bottom, selected enriched KEGG pathways colour coded 
according to significance. For acomplete list of KEGG terms, see Supplementary 
Table 21. p, Top, Venn diagram depicting the overlap of broad H3K4me3 domains 
within promoters regions of young and old fibroblasts. For details see 
‘Chromatin immunoprecipitation followed by sequencing and analysis of the 
epigenomic landscape’). Bottom, selected enriched KEGG pathways colour 


coded according to significance. For acomplete list of significant KEGG terms, 
see Supplementary Table 2j. q, The relationship between changes in H3K4me3 
peak intensity (as described ina) and gene expression for H3K4me3 peaks that 
are significantly different between the age groups (as described in Fig. 2b). The 
analysis was restricted to H3K4me3 peaks that are within promoter regions, 
defined as the transcriptional start site +2 kb. The y axis denotes the log,- 
transformed fold change in gene expression between young and old fibroblasts 
for the gene, and x axis denotes the log,-transformed fold change of H3K4me3 
peak intensity assigned to the gene. Genes of interest are labelled. r-t, Pathway 
analysis of all putatively identified metabolites that were significantly different 
between young and old fibroblasts (described inc), as well as the differentially 
expressed genes (described in Fig. 2b; FDR-adjusted P< 0.05, absolute fold 
change >1.5), using the MetaboAnalyst tool**. Note that the MetaboAnalyst tool 
does not provide multiple-hypothesis-corrected P values (one-sided Fisher’s 
exact test). s, Box plots showing the log,-transformed signal intensities of 
selected metabolites in the arginine and proline pathway from the metabolic 
profiling of young and old fibroblasts cultures at passage 3 (as described inc), 
for which the identity was confirmed using commercially available standards. 
Box plots depict the median and interquartile range, with whiskers indicating 
minimum and maximum values. *P< 0.05, **P< 0.01; two-tailed Wilcoxon rank- 
sum test with q value correction; L-arginine, P= 0.069; L-ornithine, P=0.690; 
L-glutamate, P= 0.055; creatine, P= 0.022; creatinine, P=0.002; putrescine, 
P=0.081; spermidine P=0.094; spermine, P= 0.016. t, Schematic 
representation of the biological functions of key metabolites and genes in the 
arginine and proline metabolic pathway, and how they relate to regulation of 
inflammatory cytokines and extracellular matrix synthesis* *”. Abundance of 
putative metabolites (oval) and gene transcripts (squares) in old fibroblasts is 
colour coded (red, higher in old; blue, lower in old; grey, not significantly 
different or not detected). Epigenomic changes are indicated with black 
asterisks. u, Top, top 3 motifs found in promoters of differentially expressed 
genes between young and old fibroblasts described in Fig. 2b, using HOMER 
motif analysis. Bottom, top 10 putative upstream regulators identified by the IPA 
database that are differentially expressed between young and old fibroblasts 
(FDR-adjusted P< 0.05, absolute fold change >1.5). Heat map depicts log,- 
transformed fold change in expression (old/young) calculated using DESeq2. 
Thetranscription factor identified across both analyses (EBF2) is inred. 
*P<0.05,**P<0.01,***P< 0.001; cumulative hypergeometric distribution 
(Homer motif analysis), activation z&#x2010;score in IPA (upstream regulator 
analysis). Foracomplete list of significant motifs and upstream regulators, and 
exact Pvalues, see Supplementary Table 2n. v, Top, heat map of differentially 
expressed genes ina regression analysis from young to old healthy human 
primary fibroblasts® (n=13, FDR-adjusted P< 0.05). Expression is shownas VST- 
transformed read counts, scaled row-wise. The depicted KEGG pathways are 
FDR-adjusted P< 0.15 (one-sided Fisher’s exact test with Benjamini-Hochberg 
correction) (Supplementary Table 20, p). Bottom, VST-transformed expression 
of FBF2across human samples as a function of age (years). Each square 
represents transcripts from a patient. NB, newborn. w, Pathway enrichment 
analysis of KEGG pathways associated with enhanced (up) or reduced (down) 
reprogramming, comparing H3K4me3 peak intensities between the top 

(n=1) andthe bottom (n=1) old reprogramming cultures in our datasets (see 
Supplementary Table 2a). All depicted KEGG pathways were significantly 
enriched (FDR-adjusted P< 0.05). *P< 0.05; two-sided nominal P value with 
Benjamini-Hochberg correction. For acomplete list of KEGG terms and exact 
Pvalues, see Supplementary Table 3f. 


o 
ion 


‘i ‘ Z RNA-seq 
RNA-seq iPSCs RNA-seq iPSCs RNA-seq iPSCs Fibroblasts and iPSCs 
8 
8 2° 2 Genes up in 
ew A Young iPSCs foe ed young fibroblasts (389) 
sk °. Q1o 
ao 2 ana A. Good Old #1 iPSCs 8 2 a 2. 1 4 Genes up in 
Ps => AGood Old #2 iPSCs & 42 e old fibroblasts (633) 
som ore ooTrrrr = = “a ABad Old iPSCs B43 a $ Genes up in 
B88 88s oO BESEEES ODDDODDDDG BES col A ° eet B0 young iPSCs (2) 
Bose o5 55 BSSIOS Foo eerie Bao 8 s oTEGOO a Genes up in 
dugees 2 cue P2y00 ev oooe MP & 3 BEEP DBO > old iPSCs (3) 
POM retrafraratiatis 9 oy 3g00000555 2 A Fee eect ee at 
UWesoPooee oe bapa ry gr SOs 25 
90023325 A 2 BBSs3 ce % 
£ ro) z 
a A ; : 3 ° a s 2 Fibroblasts iPSCs 
a T T T 
- - - - -50 0 50 100 150 8 8 8 (pooled) (pooled) 
Fibroblast-like Embryonic stem cell-like PC1 (28%) 1o) 
e i f i 9g Metabolome 
Metabolome iPSCs Metabolome iPSCs Fibroblasts and iPSCs 
2 ®@ Metabolites up in 
a] Bi Young iPSCs a young fibroblasts (189) 
g ood Old #1 iPSCs gn @ Metabolites up in 
| Hi Good Old #2 iPSCs Eo ]o g ‘ ES (295) 
icy i H245 jetabolites up in 
= ) Mibad Old iPscs ac jz o 6 Qo 8 a s young iPSes (0) 
= a rie 2 Q ae 2ana oJ @ Metabolites up in 
3) a Seat oos of S25 t eB 2 old iPSCs (0) 
bas Fou 2? ota sees x 
oa. a gaa>o ee ° 
g a. Boge Br 22" os 4 
i) a B23 55 8 
6 as 33 Fibroblasts iPSCs 
-100 50 0) 50 8 60 (pooled) (pooled) 
PC1 (25%) 
h i j 
RNA-seq iPSCs RNA-seq iPSCs RT-qPCR on early passage fibroblasts 
4 20 Bi Young fibroblasts P3 
3 A Young iPSCs B40 a oa BB Old fibroblasts P3 
A Good Old #1 iPSCs 3 29° 5 s pais} 
3 | Good Old #2 iPSCs aaie* @ S15 
or |A Bad Old iPSCs Belo? 9 a3 
= a > 3 Do xc 
KR S 20 os 
= > =a £9 
no 3 eee ogt0 
13) 2 SO naa 8 S Eo 
= Piggaeeece a ee = * 
° wpromF® == aon oe 5 — kk kK - ek 
2 ORO" BERETS Be = a 7 ae 
o- 350 9 ue a aS 
ae gi nee ee *] 
-20 0) 20 8 6 8 = 
PC1 (24%) 5 8 
Based on differentially expressed genes in fibroblasts 
k l 
RT-qPCR on late passage fibroblasts RT-qPCR on iPSCs 
809 * * Bi Youn 
os a \9 fibroblasts P33 80: ns. P 
3 * 70 a Young iPSCs P23 
se 60 | i Bl Old fibroblasts P33 cB 89 ne: Old ipscs P23 
Bo $5 = ns. 
gS 23 40 = 
ge ao 25 
es 85 a9 
£9 S 
a oF 
2 SE 
52 OE 15 
Bo 52 
38 BL 49 ns. ns. 
es Ba ns. TT 
= ue 5 BS ons. Oss ni BS ons. ns. 
olla UM 3 <= 3 a; Ub Bas 22 
Si Se ne x + @ Sh © © oa 
9S od ee gg wg” P=" € 
eC FS OS oF ah oP Ss noe es 


Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Reprogramming erases features of inflammageing 
and variability between mice. a-l, To test whether transcriptomic and 
metabolomics features of inflammageing could be erased by reprogramming, 
iPS cell lines from young and old fibroblasts at passage 23 were profiled for their 
transcriptome (RNA-seq) and metabolome.a, All iPS cell lines (n=11) cluster 
with previously established bona fide iPS cell lines (n=3) and ES cells (n=4)%. 
Unsupervised hierarchical clustering based on overall transcriptomes of the 
indicated cell types. The hierarchical clustering was performed using 
correlation-based dissimilarity (Pearson’s) as distance measure and average for 
linkage analysis. The y axis indicates the similarity between samples. b, PCA of 
whole transcriptomes from RNA-seq data of iPS cell lines derived from young 

(3 months, n=5) and old (29 months, n=6) mice at passage 23. Principal 
components (PCs) land 2 are shown. c, Unsupervised hierarchical clustering of 
transcriptomes described in b. Hierarchical clustering was performedasina. 

d, Strip plot illustrating the log,-transformed fold expression changes of all 
genes with age for fibroblasts (left; described in Fig. 2b) and iPS cells (right; 
described in b). Genes detected as significantly upregulated or downregulated 
(DESeq2, FDR-adjusted P< 0.05, absolute fold change >1.5) with age, are shown 
in blue and yellow, respectively (see Supplementary Table 2q). e, PCA of 
metabolomes of iPS cell lines derived from young (n=5) and old (n=8) mice at 
passage 23. Ages asin b. Untargeted metabolomics profiles were generated 
using ultra-high performance liquid chromatography-mass spectrometry. 
Principal components land 2 are shown. f, Unsupervised hierarchical clustering 
of metabolomics profiles described ine. Hierarchical clustering was performed 
as ina.g, Strip plot illustrating the log-transformed fold change in signal 
intensity of all metabolic features with age for fibroblasts (left; described in 
Extended Data Fig. 2c) and iPS cells (right; described ine). Metabolic features 
detected as significantly up or down (using a two-tailed Wilcoxon rank-sum test 
with q value correction) with age are shown in blue and yellow, respectively. 

h, i, PCA (h) and unsupervised clustering (i) of iPS cell lines derived from young 


(n=5) and old (n=6) mice at passage 23, based on solely the genes that were 
significantly differentially expressed between young and old at fibroblast level. 
Ages as inb. Principal components 1and 2 are shown. Hierarchical clustering was 
performed as ina.j-I, RT-qPCR of the indicated genes in fibroblasts cultures at 
passage 3 (j) and 33 (k), and iPS cell cultures at passage 23 (I). The genes shown 
represent the three major groups of features associated with fibroblast 
activation that change with age in fibroblasts. Box-and-whisker plot of log,- 
transformed fold change in MFI over median of young fibroblasts. Box plots 
depict the median and interquartile range, with whiskers indicating minimum 
and maximum values. Data are from young (n= 6) and old (n=6) fibroblast 
cultures at passage 3, young (n=5) and old (n=6) fibroblast cultures at passage 
33, young (n= 6) and old (n=7) iPS cell cultures at passage 23. Ages asinb. 
*P<0.05,**P< 0.01; one-tailed Wilcoxon rank-sum test with Benjamini-— 
Hochberg correction. Forj: Ccl7 (also knownas Mcp3), P=0.004; Ccl2 (also 
knownas Mcp!1), P= 0.004; Acur2a (which encodes ACVR2q), P= 0.006; Ccl11 
(also knownas Eotaxin), P= 0.004; Pak6, P=0.004; Thsb2, P= 0.004; Actn3, 
P=0.006; Collal (which encodes COL1a1), P= 0.004; Acta2 (which encodes 
aSMA), P=0.004; Lama2, P=0.004; Dmd, P=0.008; F2r, P=0.008. For k: Ccl7, 
P=0.027; Ccl2, P= 0.027; Acur2a, P= 0.027; Ccl11, P= 0.049; Pak6, P= 0.027; 
Thsb2, P=0.026; Actn3, P=0.026; Collal1, P= 0.035; Acta2, P= 0.027; Lama2, 
P=0.027; Dmd, P= 0.027; F2r, P=0.229. For |: Ccl7, P=0.800; Ccl2, P= 0.800; 
Acur2a, P=0.800; Ccl11, P=1.000; Pak6, P=0.800; Thsb2, P=1.000; Actn3, 
P=0.800; Collal, P=1.000; Acta2, P=1.000; Lama2, P=1.000; Dmd, P=1.000; 
F2r, P=1.000. Note that the experiments were conducted independently in 
fibroblasts at passage 3, 33 and iPS cells, and therefore the statistical 
comparisons indicated were restricted to each independent experiment. 
However, a comparison between the expression of secreted factors at passage 3 
to 33 shows that expression of Ccl11, but not Ccl2 and Ccl7, significantly 
decreases upon passaging. 
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Extended Data Fig. 4 | Correlation between the fibroblast activation 
signature and reprogramming efficiency in single-cell RNA-seq data. 

a, Single-cell RNA-seq of young (n= 30 cells), good and bad old (n= 31 cells) 
fibroblast cultures. t-SNE was performed on the VST-transformed read counts of 
all detected genes (analysed using DESeq2). Each dot represents a single 
fibroblast transcriptome. b, PAGODA of single-cell RNA-seq data from young 
and old fibroblasts at passage 3 performed using raw expression counts and all 
KEGG pathways, the in vitro fibroblast ageing, the fibroblast activation and 

de novo gene sets. Hierarchical clustering is based on 97 significantly 
overdispersed gene sets and the 405 genes driving the significantly 
overdispersed gene sets. Top, heat map of single cells from young and old 
fibroblast cultures. Middle, heat map of the separation of cells based on their 
principal component scores for the significantly overdispersed gene sets. Top 
heat map, PAGODA clustering of cells. Maroon and blue colours indicate 
increased and decreased expression of the associated gene sets, respectively. 
c, PAGODAas described in Extended Data Fig. 4c. Top, heat map of single cells 
from young and old fibroblast cultures. Middle, heat map of separation of cells 
based on their principal component scores for the significantly overdispersed 
gene sets. Top heat map, PAGODA clustering of cells. Maroon and blue colours 
indicate increased and decreased expression of the associated gene sets, 
respectively. d, PAGODA as described inc. Middle, heat map of separation of 
cells based on their principal component scores for the in vitro fibroblast ageing 
signature. Bottom, heat map of the expression of the genes that are part of the 


in vitro fibroblast ageing signature, and decrease with age; expressionis shown 
as VST-transformed read counts, scaled row-wise. The scale for expression fold 
changes is indicated on the right. The bottom heat map indicates the cells that 
originate from good and bad old cultures. e, PAGODA as described inc. Middle, 
heat map of separation of cells based on their principal component scores for 
the in vitro fibroblast ageing signature. Bottom, heat map of expression of the 
genes that are part of the in vitro fibroblast ageing signature, and increase with 
age; expression is shown as VST-transformed read counts, scaled row-wise. The 
scale for expression fold changes is indicated on the right. The bottom heat map 
indicates the cells that originate from good and bad old cultures. f, PAGODA as 
described inc. Middle, heat map of the separation of cells based on their 
principal componentscores for the fibroblast activation signature. Bottom, 
heat map of the expression of the genes that are part of the fibroblast activation 
gene set; expression is shown as VST-transformed read counts, scaled row-wise. 
The scale for expression fold changes is indicated on the right. The bottom heat 
map indicates the cells that originate from good and bad old cultures. 

g, PAGODAas described inc. Middle, heat map of the separation of cells based 
ontheir principal component scores for the KEGG cytokine-cytokine receptor 
interaction gene set. The heat map shows expression of the top 30 
overdispersed genes in the KEGG cytokine-cytokine receptor interaction 
pathway; expression is shownas VST-transformed read counts, scaled row-wise. 
The scale for expression fold changes is indicated on the right. The bottom heat 
map indicates the cells that originate from good and bad old cultures. 
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Extended Data Fig. 5 | Old fibroblast cultures are enriched for activated 
(THY1‘PDGFRa’) fibroblasts, which are intrinsically poor at reprogramming 
but facilitate reprogramming extrinsically via secretion of cytokines. 

a, Left, representative immunofluorescence images of young and old fibroblasts 
at passage 3 stained for aSMA (whichis encoded by the Acta2 gene) . Right, 
quantification of the percentage of aSMA‘ cells in young (3 months, n=8) and 
old (29 months, n= 8) fibroblasts at passage 3 (lexperiment). Based on 
reprogramming efficiency, old cultures are shown as good old, bad old or old. 
Each dot represents cells from one mouse. Lines depict median. Pvalue, two- 
tailed Wilcoxon rank-sum test. b, Left, representative immunofluorescence 
images of young and old fibroblasts at passage 3 incubated with EdU for 4h, then 
stained for aSMA (red), EdU (green) and DAPI (blue). White arrows indicate an 
EdU-positive activated and anon-activated cell. Right, FACS quantification of 
the percentage of EdU-positive THY1 PDGFRo® (THYI1 ) and THYI‘PDGFRa* 
(THY1T*) cellsin young (3 months, n=5;3 independent experiments) and old 

(29 months, n=5;1experiment) cultures at passage 3. Dots and lines as ina. 
Pvalues, two-tailed Wilcoxon rank-sum test. c, PAGODA of single-cell RNA-seq 
from young and old fibroblasts described in Extended Data Fig. 4c. Top heat 
map, PAGODA clustering of cells. Maroon and blue colours indicate increased 
and decreased expression of the associated gene sets, respectively. Middle heat 
map, expression of genes in the GO cellular senescence gene set, for which 
expression is shown as VST-transformed read counts, scaled row-wise. Bottom 
heat map, cells that originate from good and bad old cultures. The scale for 
expression fold changes is indicated on the right. d, RT-qPCR of p16'"*** 
expression in cultures of THYT PDGFRa’ (THYI) and THY1*PDGFRa* 

(THY1*) young and old cells at passages 4-6. Results are shownas fold changein 
expression over THY1 PDGFRa cells. Data are from young (3 months, n=5) and 
old (29 months, n=5) cultures (6 independent experiments). One young and 
three old cultures were used in 2-3 independent experiments. In this case, an 
average of the measurements was determined. Dots and lines as ina. Pvalues, 
two-tailed Wilcoxon signed-rank test. e, Percentage of SA-B-galactosidase- 
positive cells in young (3 months, n=11) and old (28-29 months, n= 22) fibroblast 
cultures at passage 3 (3 independent experiments). log,-transformed fold 
change in SA-B-galactosidase-positive cells over median of young fibroblasts. 
Line indicates median. Pvalues, two-tailed Wilcoxon ranked sum test. f, g, RT- 
qPCR of old THYT PDGFRo® (THYIT) and THYI*PDGFRa’ (THY1") cells at passage 
4-6 (3 independent experiments) untreated (f) (n=6,3 independent 
experiments) or treated with the indicated shRNA constructs for 72h (g) (n=5,4 
independent experiments). Box-and-whisker plot of fold change in expression 
over THY] PDGFRa’ populations originating from the same culture (f) or over 
shLuciferase (shLuc) treated cells (g). Box plots depict the median and 
interquartile range, with whiskers indicating minimum and maximum values. 
*P<0.06, one-tailed Wilcoxon signed-rank test with Benjamini-Hochberg 
correction. h, RT-qPCR on young (3 months, n=5) fibroblasts after 
overexpression of Fbf2 for 48 h (2 independent experiments). Box-and-whisker 
plot of fold change in expression over cells treated with empty vector. Box plots 
as inf.*P< 0.06, one-tailed Wilcoxon signed-rank test with Benjamini-Hochberg 
correction. i, Heat map of significantly differentially expressed genes 
(determined by DESeq2) between freshly FACS-sorted THY1 PDGFRo‘ Lin 
(THYL) and THYI’PDGFRo‘ Lin (THY1*) cells described in Fig. 3b and enriched 
KEGG pathways. Expression is shownas VST-transformed read counts, scaled 
row-wise. The scale for expression fold changes is indicated on the left. All 
depicted KEGG pathways were significantly enriched (one-sided Fisher's exact 
test with Benjamini-Hochberg correction, FDR-adjusted P< 0.05). Fora 
complete list of KEGG terms, see Supplementary Table 4e. j, Pathway 
enrichment analysis of KEGG pathways associated with ageing in dataset 
described in Fig. 3b. Fora complete list of KEGG terms, see Supplementary 
Table 4g. **P=0.01, ***P= 0.001; two-sided nominal P value with Benjamini-— 
Hochberg correction. k, Top, ranked fold change (old/young) in levels of the 
indicated cytokines in plasma (see Extended Data Fig. 1a). Bottom, ranked fold 


change (old/young) in expression for the indicated cytokines in freshly FACS- 
sorted THY1 PDGFRa‘Lin (THY) and THY1'PDGFRoa‘ Lin’ (THY1’) cells from 
young and old ears. See ‘Cytokine profiling analysis on plasma and conditioned 
medium using Luminex multi-analyte’ for calculation of ranked fold changes. 
Gene expression related to wounded fibroblasts is from datasets described in 
Extended Data Fig. 7e-g. I-n, Correlation between the proportion of 
THY1'PDGFRa’ (THY1’) fibroblasts in old cultures (29 months, n= 23) (I), young 
(3 months, n=21) and bad old (29 months, n=6) cultures (m), and young 

(3 months, n=21) and good old (29 months, n= 6) cultures (n), and the 
reprogramming efficiency of the culture (3 independent experiments). Dots as 
ina. Pvalues, two-sided algorithm AS 89 inR. They axis denotes the fold change 
in the proportion of THY1°PDGFRa‘ fibroblasts relative to the median of young 
mice, and x axis denotes the fold change in reprogramming efficiency of the 
culture relative to the median of young mice. 0, p, Correlation between the 
proliferation rate (o) or the percentage of SA-B-galactosidase-positive cells (p) of 
agiven fibroblast culture and reprogramming efficiency of the culture. 
Proliferation rate was determined by calculating the growth slope of young 

(3 months, n=15), middle-aged (12 months, n=10) and old (28-29 months, n=27) 
ear fibroblast cultures at passage 3 (4 independent experiments). Senescence 
was assessed by SA-B-galactosidase staining of young (3 months, n=11), middle- 
aged (12 months, n=11) and old (28-29 months, n= 22) ear fibroblast cultures at 
passage 3 (3 independent experiments). Dots as ina. Pvalues, two-sided 
algorithm AS 89 inR. The yaxis denotes the fold change in the proliferation rate 
or percentage of SA-B-galactosidase-positive cells relative to the median of 
young mice, and xaxis denotes the fold change in reprogramming efficiency of 
the culture relative to the median of young mice. q, Reprogramming efficiency 
of FACS-sorted young (3 months, n= 8) and old (29 months, n=7) THY PDGFRoa®* 
(THYT) and THY1°PDGFRa‘’ (THY1’) fibroblasts at passages 4-6, assessed using 
AP staining (3 independent experiments). log,-transformed fold change in 
reprogramming efficiency of the cells relative to the median of young 

THYI PDGFRa’ fibroblasts. One old culture was used in two independent 
experiments. In this case, an average of the measurements was determined. 
Dots and lines asina. Pvalues, two-tailed Wilcoxon signed-rank test. r, 
Reprogramming efficiency of FACS-sorted young (3 months, n=5) 

THY1 PDGFRa’ (THYL ) and THY1’PDGFRo® (THY1') fibroblasts at passages 4-6, 
assessed as in q (3 independent experiments). Reprogramming was induced 
using anon-lentiviral piggyBac transposon system. Results are shown as number 
of AP* colonies. Dots and lines as ina. Pvalues, one-tailed Wilcoxon rank-sum 
test.s, t, Cytokine profiles of conditioned medium collected from cultures of old 
(s) (29 months, n=6, 3 independent experiments) and young (t) (3 months, n=6, 
2 independent experiments) THY! PDGFRo’ (THY1) and THY1*PDGFRa*® 
(THY1*) fibroblasts at passages 4-6 . Comparisons were made between 

THYI PDGFRa’ and THYI*PDGFRa‘’ from the same original culture. Based on 
cytokines that are significantly different in conditioned medium from 
fibroblasts (Fig. 1b). Box-and-whisker plot of log,-transformed fold change in 
mean fluorescence intensity (MFI) over THY1 PDGFRa‘’ fibroblasts. Box plots as 
in f.*P< 0.05, one-tailed Wilcoxon rank-sum test with Benjamini-Hochberg 
correction. Exact Pvalues are in Supplementary Table 4h. u, Reprogramming 
efficiency, assessed as in q, of FACS-sorted young (3 months, n= 6) 

THY1 PDGFRa’ (THYT ) and THY1’PDGFRo’* (THY1’) fibroblasts at passages 4-6 
treated with fresh conditioned medium daily starting from day lafter infection 
(3 independent experiments). Conditioned medium was collected daily from the 
THY1 PDGFRo’ or THYI‘PDGFRa‘ fibroblasts from the same original culture. 
log,-transformed fold change in reprogramming efficiency relative to the 
reprogramming efficiency of THY] PDGFRa fibroblasts treated with 
conditioned medium from THY1 PDGFRo' fibroblasts. One young culture was 
used in two independent experiments. In this case, an average of the 
measurements was determined. Dots and lines asina. Pvalues, two-tailed 
Wilcoxon signed-rank test. For individual experiments in b, d-h, Lu and exact 
Pvalues, see Supplementary Table 7. 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Old fibroblasts secrete cytokines, including IL-6 and 
TNF, that induce inflammatory signalling pathways and modulate 
reprogramming efficiency. a, Reprogramming efficiency, assessed using AP 
staining, of FACS-sorted young (3 months, n=8) fibroblast cultures at passage 3 
that were treated with conditioned medium from day 1after infection 

(4 independent experiments). Conditioned medium was collected from young, 
good old or bad old fibroblast cultures. Results are shownas fold change in 
reprogramming efficiency relative to young fibroblasts treated with young 
conditioned medium. Each dot represents cells from one mouse. Lines depict 
median. Pvalues, two-tailed Wilcoxon signed-rank test with Benjamini- 
Hochberg correction. b, Reprogramming efficiency, assessed as ina, of bad old 
(left, n=7) or good old (right, n = 6) fibroblast cultures at passage 3 treated with 
conditioned medium from day 1 after infection (5 independent experiments). 
Conditioned medium was collected from good or bad old fibroblast cultures. 
Results are shown as fold change in reprogramming efficiency over old 
fibroblasts treated with bad old conditioned medium. Dots and lines asina. 
Pvalues, two-tailed Wilcoxon signed-rank test. c, Different representation of the 
data from Fig. 3e; each diamond represents the fold difference in 
reprogramming efficiency between a unique pair of good and bad old cultures 
(n=8 pairs of good and bad old cultures, 5 independent experiments). Dots and 
lines as ina. P values, two-tailed Wilcoxon signed-rank test with Benjamini- 
Hochberg correction. d, Western blot analysis of young fibroblasts at passage 3 
treated with the indicated cytokines at the concentration of 10 ng mI‘ for 

30 min. Representative of 3 independent experiments. e, Reprogramming 
efficiency, assessed as ina, of young fibroblast cultures (3 months, n=10) at 
passage 3 (3 independent experiments). Cells were treated with the indicated 
cytokines from day 1after infection at the concentration of 10 ng mI. Results 
are shownas log,-transformed fold change in reprogramming efficiency over 
untreated cells. Dots and lines asin a. Pvalues, two-tailed Wilcoxon signed-rank 
test with Benjamini-Hochberg correction. f, Reprogramming efficiency, 
assessed by SSEAI staining, of young fibroblast cultures (3 months, n=4) at 
passage 3 treated with the indicated cytokines from day after infection at the 
concentration of 10 ng mI (2 independent experiments). Results are shown as 
log,-transformed fold change in reprogramming efficiency over untreated cells. 
Dots and lines asina. Pvalues, one-tailed Wilcoxon signed-rank test with 
Benjamini-Hochberg correction. g, Reprogramming efficiency, assessed as in 
a, of old fibroblast cultures (29 months, n=7) at passage 3 treated with the 
indicated cytokines from day 1 after infection at the concentration of 10 ng mI? 
(3 independent experiments). Results are shown as log,-transformed fold 
change inthe reprogramming efficiency over untreated cells. Note that lold 
culture was used in 2independent experiments. In this case, an average of the 
resultant measurements was determined. Dots and lines as ina. Pvalues, two- 
tailed Wilcoxon signed-rank test with Benjamini-Hochberg correction. 

h, Reprogramming efficiency, assessed as inf, of old fibroblast cultures 

(29 months, n =3) at passage 3 treated with the indicated cytokines from day1 


after infection at the concentration of 10 ng mI (2 independent experiments). 
Results are shownas log,-transformed fold change in the reprogramming 
efficiency over untreated cells. Dots and lines as ina. Pvalues, one-tailed 
Wilcoxon signed-rank test with Benjamini-Hochberg correction. i, Western blot 
analysis using the indicated antibodies of young fibroblasts at passage 3 treated 
with the indicated cytokines (10 ng mI) and blocking antibodies (8 pg mI“) for 
30 min. Cytokines were pretreated with either IgG or their corresponding 
blocking antibodies for 1h before treatment. Representative of 2 independent 
experiments. j, Western blot analysis of old fibroblasts at passage 3 treated with 
the indicated cytokines at a concentration of 10 ng mI‘ for 30 min. 

k, Reprogramming efficiency, assessed as ina, of young fibroblast cultures 

(3 months, n=4) at passage 3 treated with the indicated conditions from day1 
after infection (2 independent experiments). Cytokines (10 ng ml”) were 
pretreated with either IgG or their corresponding blocking antibody (8 pg mI) 
for 1hbefore treatment. Results are shownas log,-transformed fold change in 
reprogramming efficiency over untreated cells. Dots and lines as ina. Pvalues, 
one-tailed Wilcoxon signed-rank test with Benjamini-Hochberg correction. 
I,m, Reprogramming efficiency, assessed as ina, of young fibroblast cultures 

(3 months, n=6) at passage 3 treated with the indicated conditions from day1 
after infection (3 independent experiments). Conditioned medium was 
pretreated for 1h with the indicated blocking antibody before administration. 
Results are shownas log,-transformed fold change relative to conditioned 
medium treated with IgG. Dots and lines as ina. Pvalues, one-tailed Wilcoxon 
signed-rank test with Benjamini-Hochberg correction. n, Different 
representation of the data from Fig. 3f; each diamond represents the fold 
difference in reprogramming efficiency between a unique pair of good and bad 
old cultures (n=6 pairs of good and bad old cultures, 4 independent 
experiments). Line marks median. P values, two-tailed Wilcoxon signed-rank test 
with Benjamini-Hochberg correction. 0, Heat map showing the Spearman rank 
correlation coefficients between the levels of individual cytokines (top row), the 
ratio of the levels of TNF to other cytokines (middle row), the ratio of the levels of 
IL-6 to other cytokines (bottom row), and reprogramming efficiency in young 

(3 months, n=19) and old (29 months, n=18) cells (2 independent experiments). 
*P<0.05, **P< 0.01; two-sided algorithm AS 89 in R; TNF:IL-6, P= 0.040; IL- 
6:IFNy, P= 0.010; IL-6:IL-18, P=0.040; IL-6:TNF, P= 0.040; IL-6:IL-10, P= 0.040; IL- 
6:CSF2, P= 0.005. The remaining P values can be found in Supplementary 

Table 7. p,q, The levels of IL-6 (p) or TNF (q) are not correlated with 
reprogramming efficiency. The y axis denotes the fold change in the levels of the 
indicated cytokine relative to the median of young mice and thex axis denotes 
the fold change in reprogramming efficiency of the culture relative to the 
median of young mice. Data are from young (3 months, n=19) and old 

(29 months, n=18) mice (2 independent experiments). Pvalues, two-sided 
algorithm AS 89 inR. For individual experiments in a—c, e-h, k-q, see 
Supplementary Table 7. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Ageing is associated with an increased variability in 
wound healing between old mice, and old fibroblasts in wounds are distinct 
from primary fibroblasts derived from healthy ear skin. a, Example images of 
ear wounds of young mice, fast-healing old mice (fast old) and slow-healing old 
mice (slow old) at the indicated time points (2 independent experiments). Ink 
circles depict initial size of wounds. b, Ear wound healing curve from young 
(3-4 months, n=26) and old (24-26 months, n=28) mice (2independent 
experiments). Full thickness wounds were induced on the dorsal side of both 
ears (see ‘Wounding and wound healing experiments’ for details) and the size of 
the wounds was assessed by imaging ear wounds every second day for 20 days. 
For each mouse, the average of both ear wounds was calculated. Graph depicts 
the average percentage of wound area remaining at the indicated time points. 
Data are mean+s.e.m.c, Ear wound healing curves of the five fastest and the five 
slowest healing young and old mice. Graph depicts the average average of 
wound area remaining at the indicated time points. Data are mean +s.e.m. 

d, FACS analysis as described in Fig. 3b to assess the percentage of 
THYI‘PDGFRo‘ Lin (THY1*) cells in ears of young and old mice during basal 
conditions and at 7 days after induction of wounds. Results are shownasa 
percentage of THY1'PDGFRa‘ Lin’ cells over PDGFRa‘Lin cells. Datashownare 
from young basal (3-4 months, n=9 replicates, each with 2-3 mice), young 
wounded (3-4 months, n=8 replicates, each with 2-3 mice), old basal 

(24-26 months, n=10 replicates, each with 2-3 mice) and old wounded 

(24-26 months, n=8 replicates, each with 2-3 mice) (3 independent 
experiments). Each dot represents a replicate with cells pooled from 2-3 mice. 


Line depicts median percentage. Pvalues, two-tailed Wilcoxon rank-sum test. 
Note that the percentage of THY1’PDGFRa‘Lin’ in young and old basal 
conditions is also presented in Fig. 3b. e, Pathway enrichment analysis based on 
population RNA-seq of young wounded (3-4 months, n=6 replicates, each with 
2-3 mice) and old wounded (24-26 months, n=6 replicates, each with 2-3 mice) 
THY1 PDGFRa‘ Lin’ and THYI*PDGFRa‘Lin’ cells in vivo (1experiment). The 
graphshowsa subset of KEGG pathways that were found to be significantly 
enriched (FDR-adjusted P< 0.05). Foracomplete list of differentially expressed 
genes and pathways, with corresponding specific Pvalues, see Supplementary 
Table Sc, d. **P< 0.01, ***P< 0.001; two-sided nominal P value with Benjamini- 
Hochberg correction. f, Comparison between the transcriptomic changes that 
occur in fibroblasts with age in vitro (as described in Fig. 2b) and in vivo (as 
described in Fig. 3b), as well as changes that occur upon wounding in young and 
oldears (as described ind). The heat map depicts the enrichment of the KEGG 
pathways that are present in at least two of the conditions described. For the 
complete list of differentially expressed genes and significant KEGG terms with 
specific Pvalues, see Supplementary Tables 2b, c, 4f, g, 5a, b. The scale for 
enrichment is indicated on the left. g, Heat map of expression of a subset of 
cytokine genes from population RNA-seq of fibroblasts from young and old ears 
during basal and wounded conditions. Expression is shown as VST-transformed 
read counts, scaled row-wise. The scale for expression fold changes is indicated 
onthe left. Basal and wound signatures refer to the average expression of the 
genes that are significantly downregulated or upregulated with wounding, 
respectively, in this dataset. 
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Extended Data Fig. 8| See next page for caption. 
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Extended Data Fig. 8 | Single-cell RNA-seq analysis of fibroblasts in wounds 
from young and old mice and single-cell RNA-seq analysis of entire wounds 
from old slow- and fast-healing mice at day 7. a, Quality control for 10x 
Genomics single-cell RNA-seq data of freshly isolated PDGFRa’Lin™ 
(CD45°CD31 EpCAM TER119 TIE2 ) cells from wounds of young mice (3- 
4months, cells pooled fromn=10 mice) or old mice (24-26 months, cells pooled 
fromn=10 mice), 7 days after induction of wounds. Number of unique genes 
(left), percentage of mitochondrial genes (middle) and number of unique 
molecular identifiers (UMIs, right) for each cell are shown, separated by age 
group. Each dot represents a single cell. b, Seurat analysis of all live high-quality 
PDGFRa‘ Lin’ cells described ina (3,036 cells in total) identified two main 
clusters of cells. Heat map depicts the expression of the top 10 marker genes for 
each significant cell cluster identified by Seurat, which are defined as the genes 
that are most specific to each population. The cell subpopulation identity 
assigned to each cluster is indicated below each column. c, PAGODA of the 
single-cell RNA-seq dataset described ina. PAGODA was performed using all 
KEGG pathways, and the in vitro fibroblast ageing and the fibroblast activation 
signatures (see Supplementary Table 2b, f). Top, heat map of single cells from 
wounds from young and old mice and cell clusters identified by Seurat and 
PAGODA analyses. Bottom, heat map of the separation of cells based on their 
principal component scores for the significantly overdispersed gene sets. Top 
heat map, PAGODA clustering of cells. Maroon and blue colours indicate 
increased and decreased expression of the associated gene sets, respectively. 
Bottom, log,-transformed fold change in the subpopulations between young 
and old woundsat day 7.d, PAGODA as described inc. Middle, heat map of the 
separation of cells based on their principal component scores for the fibroblast 
activation signature. Bottom, heat map of the expression of the genes that are 
part of the fibroblast activation signature (see Supplementary Table 2f); 
expression is shown as log-transformed and normalized gene expression values 


as calculated by Seurat and scaled row-wise. The scale for expression fold 
changes is indicated on the right. e, PAGODA as described inc. Middle, heat map 
of the separation of cells based on their principal component scores for the 
KEGG cytokine-cytokine receptor interaction gene set. Bottom, heat map of the 
expression of the genes that are part of the KEGG cytokine-cytokine receptor 
interaction; expression is shownas log-transformed and normalized gene 
expression values as calculated by Seurat and scaled row-wise. The scale for 
expression fold changes is indicated onthe right. f, PAGODA as described inc. 
Middle, heat map of the separation of cells based on their principal component 
scores for the KEGG TNF signalling pathway gene set. Bottom, heat map of the 
expression of the genes that are part of the KEGG TNF signalling pathway; 
expression is shownas log-transformed and normalized gene expression values 
as calculated by Seurat and scaled row-wise. The scale for expression fold 
changes is indicated on the right. g, Representative images of the ears of the two 
slow-healing and two fast-healing old mice used for single-cell RNA-seq at day 7 
after wounding (1 experiment). h, Ear wound healing curves of the ears of the two 
slow-healing and two fast-healing old mice used for single-cell RNA-seq. The 
percentage of the wound area that was not healed at the day 6 after induction of 
the wounds is indicated in parentheses. i, Quality control for 10x Genomics 
single-cell RNA-seq data of freshly isolated live cells from the ear wounds of slow- 
healing (n= 2) and fast-healing old mice (n= 2), 7 days after induction of wounds. 
Number of genes (left), percentage of mitochondrial genes (middle) and number 
of unique molecular identifier (right) for each cell are shown, separated by 
mouse. Each dot represents a single cell. j, Seurat analysis of all live high-quality 
cells described ini (10,797 cells in total) identified seven main clusters of cells. 
Heat map depicts the expression of the top 10 marker genes for each significant 
cell cluster identified by Seurat, which are defined as the genes that are most 
specific to each population. The cell subpopulation identity assigned to each 
cluster is indicated below each column. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Seurat and PAGODA single-cell RNA-seq analyses of 
fibroblasts identify distinct fibroblast subpopulations associated with fast- 
or slow-healing trajectories. a—g, Analysis of cells identified as fibroblasts 
from the single-cell RNA-seq dataset described in Fig. 4c. a, ‘SNE clustering of 
cellsidentified as fibroblasts (2,678 cells in total) coloured by significant 
clusters identified using a k-nearest neighbour (KNN) graph-based algorithm as 
implemented by Seurat, or by mouse. b, log,-transformed fold change inthe 
number of cells in each of the three subpopulations identified by Seurat between 
fast-healing old wounds and slow-healing old wounds. c, Seurat analysis of 
fibroblasts (2,678 cells in total) identified three main clusters. Heat map depicts 
the expression of the top 10 marker genes for each significant cell cluster 
identified by Seurat, which are defined as the genes that are most specific to 
each population. The identity of each cell subpopulation assigned to each 
cluster is indicated below each column. d, PAGODA of fibroblasts. PAGODA was 
performed using raw expression counts and all KEGG pathways, and the in vitro 
fibroblast ageing and the fibroblast activation signatures (see Supplementary 
Table 2b, f). Top, heat map of single cells from wounds of old mice and cell 
clusters identified by Seurat and PAGODA analyses. Bottom, heat map of the 
separation of cells based on their principal component scores for the 
significantly overdispersed gene sets. Top heat map, PAGODA clustering of cells. 
Maroonand blue colours indicate increased and decreased expression of the 
associated gene sets, respectively.e, PAGODA as described ind. Bottom, heat 
map of the expression of the genes that are part of the fibroblast activation 
signature (see Supplementary Table 2f); expression is shown as log-transformed 
and normalized gene expression values as calculated by Seurat and scaled row- 
wise. Thescale for expression fold changes is indicated on the right. 

f, Expression of the genes that are part of the KEGG cytokine-cytokine receptor 
interaction gene set as ine. g, Expression of the genes that are part of the KEGG 
TNF signalling pathway as ine. h-l, Analysis of the combined single-cell RNA-seq 


datasets described in Fig. 4b, c.h, Seurat analysis of combined datasets clusters 
fibroblasts from both datasets together. t-SNE clustering of all live, high-quality 
cells from both datasets (13,833 cells in total) coloured by significant clusters 
identified using a KNN graph-based algorithm as implemented by Seurat, or by 
mouse. i, -SNE clustering of combined fibroblasts from the datasets described 
in Fig. 4b (PDGFRoa’Lin ) and Fig. 4c. Combined fibroblasts (5,716 cells in total) 
are coloured by significant clusters identified using a KNN graph-based 
algorithm as implemented by Seurat, or by mouse.j, Seurat analysis of combined 
fibroblasts (5,716 cells in total) identified three main subpopulations. Heat map 
depicts the expression of the top 10 marker genes for each significant 
subpopulation identified by Seurat, which are defined as the genes that are most 
specific to each population. The cell subpopulation identity assigned to each 
cluster is indicated below each column. k, PAGODA of combined fibroblasts. 
PAGODA was performed using Seurat normalized counts and all KEGG pathways, 
the in vitro fibroblast ageing, the fibroblast activation signatures (see 
Supplementary Table 2b, f). Top, heat map of single fibroblasts from wounds of 
young and old mice or wounds from old fast- or slow-healing mice, and cell 
clusters identified by Seurat and PAGODA analyses. Bottom, heat map of 
separation of cells based on their principal component scores for a subset of the 
top significantly overdispersed gene sets. Top heat map, PAGODA clustering of 
cells. Maroonand blue colours indicate increased and decreased expression of 
the associated gene sets, respectively. Note that fibroblast subpopulation B did 
not contain cells from old/young in the combined analysis. This is probably 
owing to the fact that this subpopulation of fibroblast has some markers of the 
haematopoietic lineage, andis probably depleted inthe PFGDRa‘Lin’ FACS- 
sorting scheme used to isolate fibroblasts from the wounds of young and old 
mice. I, log-transformed fold change in each of the three combined fibroblast 
subpopulations identified by PAGODA between wounds of old fast- and slow- 
healing mice, or between wounds from young and old mice, at day 7. 
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Extended Data Fig. 10 | FACS schematic for in vitro and in vivo fibroblast 
analysis and sorting and full western blot membranes. a, FACS schematic for 
analysis and sorting of THY! PDGFRa’ and THY1°PDGFRa' cells in young and old 
cultures at passage 3. Gates shown on each plot are indicated above the plot. 
Marker and fluorophore are shown on each axis. FMO, fluorescence minus one. 
b, FACS schematic for analysis and sorting of live THY1 PDGFRa’ Lin and 


THY1'PDGFRa‘Lin cells from young and old fresh tissues (ears), used for 
population RNA-seq and single-cell RNA-seq analyses. Gates shown on each plot 
are indicated above the plot. Marker and fluorophore are shown on each axis. 

c, Full western blot membranes from Extended Data Fig. 6d, i,j. Boxes indicate 
the cropped area. 
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Reporting Summary 


Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. 


Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 
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n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


[| A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


O A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
“—! Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


[| Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Imaging: AxioVision 4.7.2, NIS Elements AR software (v4.30.02). 
FACS: BD FACSDiva software (v8.0.1) 
qRT-PCR: Bio-Rad CFX manager (v3.1) 


Data analysis Data analysis was performed using R version 3.2.1, 3.3.1, or 3.5.0. Key packages used were: DESeq2 (v1.20.1, v1.6.3), EdgeR (v3.10.2), 
DiffBind (v.1.12.3), scde (vO.99.1, v2.8.0), cellrangerRKit (v1.1.0), ggplot2 (version 3.0.0), FlowJo (version 10.2), ImageJ (v1.47), Seurat 
(v2.3.4), trim-galore software (v0.2.1), bowtie vO.12.7, FIXSEQ, MACS (v2.08), TopHat (v2.0.8b), and HTSeq (v0.6.1), Enricher (http:// 
amp.pharm.mssm.edu/Enrichr/), QIAGEN’s Ingenuity Pathway analysis (IPA QIAGEN Redwood City). 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 


All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


IO 


810Z 1290) 


All sequencing data have been deposited in NCBI BioProject database under the accession code PRJNA316110. Figures 2, 3, and 4 of this study are all associated 
with raw data, which can be found under this accession number. For Figures 2-3, raw FASTQ files for RNA-seq and ChIP-seq are provided. For Figure 4, barcoded 
BAM files for 10x single cell RNA-sequencing are provided. 


Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


x Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Except for the wound healing experiment, no statistical analysis was performed to pre-determine sample size (most of the sample size chosen 
are standard for the field). This is clearly indicated in Experimental Procedures (Statistical analysis section). In cases where samples from 
independent experiments were combined, we have clearly indicated this, and the non-combined data are provided in a Supplementary Table 
7. For the wound healing experiment, a power analysis was performed based on an initial experiment to determine the sample size required 
to detect a difference in variability with a 95% confidence interval. 
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Data exclusions Samples were excluded due to pre-established exclusion criteria (QC features), if associated phenotypic data was not reproducible, or due to 
batch-effects that could not be corrected for. 


Specifically, two plasma samples from old mice were discarded, as the coefficient of variation (CV) was > 20% for most of the cytokines 
measured between the two technical replicates for these two plasma samples. The following RNA-seq samples were excluded from further 
analyses: (1) 2 old and 3 middle-aged RNA-seq libraries that lacked corresponding young samples, and could therefore not be corrected for 
batch; (2) RNA-seq libraries from 1 good old and 1 bad old fibroblast cultures whose reprogramming efficiency could not be confirmed over 
several independent experiments; (3) RNA-seq libraries from 2 iPSC lines (out of 13 total) failed at the QC stage because they exhibited large 
differences (for example in number of reads mapped) from the rest of the samples (Supplementary Table 1f). For the wound healing 
experiment, 2 old mice were excluded, as they died before day of wound closure could be determined. This information is stated in the 
corresponding sections in Experimental Procedures. 


Replication All attempts at replication were successful. Independent replication was done for: Fig. 1b, c, Fig. 3a, b, c, d, e, f, g, Fig. 4a, Extended Data Fig. 
1a, b, e, |, m, n, Extended Data Fig. 5b, d, e, f, g, h, |, m,n, 0, p, q, r,s, t, u, Extended Data Fig. 6a, b, c, d, e, f, g, h, i, k, |, m,n, 0, p, gq, Extended 
Data Fig. 7a, b, c, d, Extended Data Fig. 10a, b, c. All of the independent experiments are presented in a supplementary table (Supplementary 
Table 7, except images such as western blots) and this is indicated in figure legends. Independent replication was not done for: Fig. 2b, c, d, e, 
f, Fig. 4b, c, d, Extended Data Fig. 1c, f, g, j, k, o, Extended Data Fig. 2, Extended Data Fig. 3, Extended Data Fig. 4, Extended Data Fig. 5a, c, i, j, 
Extended Data Fig. 7e, f, g, Extended Data Fig. 8, Extended Data Fig. 9. The “omics” datasets comprise independent biological samples, but 
were not independently replicated (RNA-seq, Single-cell RNA-seq, ChIP-seq and metabolomics datasets). Human fibroblast experiments, 
experiments with fibroblasts at passage 33, and generation of iPSC lines were not independently replicated. This is stated in the 
corresponding figure legends. 


Randomization — For the majority of experiments, young and old mice were processed in an alternating manner rather than in two large groups to minimize 
group effect. This is indicated in Experimental Procedures (Statistical analysis section). 


Blinding For all quantifications that were done with FACS or automated image quantification, no blinding was performed, including Fig. 3a, b, c, 


Extended Data Fig. 5I, m, n, Extended Data Fig. 7d. The other experiments were blinded, with the exception of Fig. 1c, Extended Data Fig. 1, 
m, Extended Data Fig. 5e, Extended Data Fig. 6f, g, h, k. This is indicated in Experimental Procedures (Statistical analysis section). 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
zl Palaeontology [|] MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used Antibodies used for FACS analysis: PE-CD140a (BioLegend, #135905, Clone: APA5S, Lot#B218257/B244566, [1:100]), FITC-CD90.2 


(BioLegend, #105305, Clone: 30-H12, Lot#B18 
TER-119, Lot#B244217, [1:100]), eFluor 450-C 
Blue-CD45 (BioLegend, #103126, Clone: 30-F1 


4407/B224687, [1:200]), Brilliant Violet 421-1 


1, Lot#B242848/B253970, [1:200]), Pacific BI 


D326 (Fisher Scientific, #50-163-76, Clone: G8.8, Lot#4277888, 


TER119 (BioLegend, #116234, Clone: 
1:200]), Pacific 
ue-CD31 (BioLegend, #102422, Clone: 


390, Lot#8182438, [1:100]), Biotin-CD202b (Thermo Fisher Scientific, #13-5987-82, Clone: TEK4, Lot#B231548, [1:200]), Brilliant 


Violet 421-Streptavidin (BioLegend, #405226, 
Lot#NN, [1:100]), APC-CD3 (BD Pharmingen, # 
Clone: RB6-8C5, Lot#NN, [1:100]), APC-F4/80 
#129611, Clone: 551, Lot#NN, [1:100]), APC-C 


Antibodies used for immunofluorescence: OCT 


557597, Clone: SP34-2, Lot# 
eBioscience, #17-4801-82, Clone: BM8, Lot# 


[3/4 (Santa Cruz Biotech, #sc9081, Polyclona 


StainAlive DyLight488 (Stemgent, #09-0067, C' 


one: MC-480, Lot#J16010000000009/2482, 


D11c (eBioscience, #17-0114-82, Clone: N418, Lot#NN, [1:100]). 


Lot#B240413, [1:200]), APC-B220 (eBioscience, #47-0452-82, Clone: RA3-6B2, 
N, [1:100]), APC-Gr-1 (eBioscience, #17-5931-82, 


N, [1:100]), APC-Siglec H (BioLegend, 


, Lot#L2211, [1:200]), SSEA-1 
1:100-200]), SOX2 (Santa Cruz 


Biotech, #sc17320, Polyclonal, Lot#, [1:200]), aSMA (Abcam, #ab7817, Clone: 1A4, Lot#GR119216-7, [1:2000]), and Alexa Fluor 
488 aSMA (Abcam, #ab184675, Clone: 1A4, Lot#GR316286-2, [1:200]). 


Antibodies used for blocking experiments: IgG (R&D systems, #AB-108-C, Polyclonal, Lot#€S4116081/ES4115041/ES4114041, [8 
ug/mL]), IL6 (R&D systems, #AB-406-NA, Polyclonal, Lot#BF0916041/BF0913111, [8 ug/mL]), and TNFa (R&D systems, #AB-410- 
A, Polyclonal, Lot#CT0714031/CT0715031/CT0716021, [8 j1g/mL]). 


=) 
je’) 
=e 
S 
= 
a) 
= 
a) 
Za) 
a) 
fed) 
=a 
(a 
=F 
= 
@) 
12) 
e) 
= 
=} 
© 
Wn 
S 
3 
fev) 
S 
Ss 


Antibodies used for Western blot: phospho-STAT3 (Tyr705) (Cell Signaling Technology, #9145, Clone: D3A7, Lot#26, [1:1000]), 
STAT3 (Invitrogen, #44-364G, Clone: 44-364G, Lot#0601, [1:2000]), phospho-STAT6 (Tyr641) (Cell Signaling Technology, #9361, 
Polyclonal, Lot#12, [1:1000]), STAT6 (Cell Signaling Technology, #5397, Clone: D3H4, Lot#1, [1:1000]), phospho-AKT (Ser473) 
(Cell Signaling Technology, #4060, Clone: D9E, Lot#19, [1:2000]), AKT (Cell Signaling Technology, #4691, Clone: C67E7, Lot#20, 
1:1000]), phospho-NFkB (Ser536) (Cell Signaling Technology, #3033, Clone: 93H1, Lot#14, [1:1000]), NFB (Cell Signaling 
Technology, #8242, Clone: D14E12, Lot#4, [1:1000]), phospho-JNK1&2 (Thr183 & Tyr185) (Invitrogen, #44-682G, Polyclonal, 
Lot#RC220615, [1:1000]), JNK1 (Invitrogen, #44-690G, Polyclonal, Lot#RC222625, [1:2000]), and B-actin (Novus Biologicals, 
#NB600-501, Clone: AC-15, Lot#061M4808, [1:50,000)). 


Antibodies used for ChIP experiments: 
#39536, Clone: 7B11, Lot#NN, 5ug). 


H3K4me3 (Active Motif, #39159, Polyclonal, Lot#NN, Sug), and H3K27me3 (Active Motif, 


Antibodies used for MACS: APC PSA-NCAM (Miltenyi, #130-093-273, Clone: 2-2B, Lot#NN, [1:8]). 


Secondary antibodies: HRP-conjugated goat anti-mouse (Calbiochem, #401215, Lot#D00157542, [1:5000]), HRP-conjugated goat 
anti-rabbit (Calbiochem, #401393, Lot#D00168510, [1:5000]), donkey anti-mouse AF568 (ThermoFisher, #410037, Lot#1752099, 
{1:500]), donkey anti-rabbit AF568 (ThermoFisher, #410042, Lot#1964370, [1:500)). 


NN= Not noted 


Only commercially available antibodies that have been widely cited in the literature were used in this study. Antibody specificity 
and quality validation were performed by the manufacturers (see manufacturers' webpages for further information). 


Validation 


The following additional validations were performed: 

-The CD140a (PDGFRa) and CD90 (THY1) antibodies were validated by confirming the expression/lack of expression of the 
respective genes by RNA-seq in the FACS-sorted populations. 

-The OCT3/4 and SOX2 staining were nuclear as expected. 

-Antibodies used for blocking experiments (polyclonal IL6 and TNFa antibodies) were validated by testing whether they block the 
activity of corresponding cytokine in in vitro assays. 

-Antibodies used for western blotting gave a band of the expected molecular weight. 

-Phospho-antibodies used for western blotting exhibited increased signal in samples treated with cytokines that are known to 
induce phosphorylation of the corresponding protein (e.g. IL6 treatment induced phospho-STAT3, TNFa treatment induced 
phospho-NFkB and phospho-JNK1&2). 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) 293T (ATCC, #CRL-11268) 


Cell line was not authenticated in-house, but 293T cells were purchased from ATCC where routine cell authentication is 
conducted. 293T cells were used only at early passages (< passage 20) for lentiviral production. 


Authentication 


Mycoplasma contamination 293T cells were negative for mycoplasma. Mycoplasma testing was conducted at regular intervals (2-3 months). 


Commonly misidentified lines - 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


All mice used in this study were male C57BL/6 obtained from the NIA Aged Rodent colony with ages ranging from 3-4 months for 
young adult animals, 12-13 months for middle-aged animals, and 20-30 months for old animals (precise ages are stated for each 
experiment in Supplementary Data Table 7). Mice were habituated for >1 week at Stanford before use. At Stanford, all mice 


Laboratory animals 


were housed in the Comparative Medicine Pavilion, and their care monitored by the Veterinary Service Center at Stanford 
University under IACUC protocol #8661. 


Wild animals No wild animals were used in this study. 
Field-collected samples No field collection of samples was conducted in this study. 
Ethics oversight At Stanford, all mice were housed in the Comparative Medicine Pavilion, and their care monitored by the Veterinary Service 


Center at Stanford University under |ACUC protocol #8661. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics As stated in Experimental Procedures, biopsies were collected from male participants of different ages (ranging from 25-90 years 
old, Supplementary Table 1g) with four biological grandparents of Ashkenazi Jewish descent, generally healthy without thyroid 
disease, diabetes, immunodeficiency, ongoing cancer or autoimmune disease, and no history of poor wound healing. 
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Recruitment All participants were recruited using paper postings or word of mouth. To restrict confounds of genetic background and disease 
state, participants were required to have four biological grandparents of Ashkenazi Jewish decent, and to be generally healthy 
without thyroid disease, diabetes, immunodeficiency, ongoing cancer or autoimmune disease, and no history of poor wound 
healing. Hence, there is a bias in genetic background and disease state. 


Ethics oversight Stanford Human Subjects approval and informed consent was obtained prior to all study procedures. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


ChIP-seq 


Data deposition 


Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links All raw sequencing reads for ChIP-seq data can be found under BioProject PRJNA316110. 
May remain private before publication. 


Files in database submission The files are annotated as: 
Young_Fib_culture4_H3K27me3 
Young_Fib_culture5_H3K27me3 
Old_Fib_culture4_H3K27me3 
Old_Fib_culture5_H3K27me3 
Old_Fib_culture6_H3K27me3 


Young_Fib_culture4_H3K4me3 
Young_Fib_cultureS5_H3K4me3 
Old_Fib_culture4_H3K4me3 
Old_Fib_culture5_H3K4me3 

3 


Old_Fib_culture6_H 


Young_Fib_culture4_INPUT 
Young_Fib_culture5_INPUT 
Old_Fib_culture4_INPUT 
Old_Fib_culture5_INPUT 

T 


Old_Fib_culture6_INPU 


Genome browser session http://genome.ucsc.edu/cgi-bin/hgTracks? 

(e.g. UCSC) hgS_doOtherUser=submit&hgS_otherUserName=salahm&hgS_otherUserSessionName=Mahmoudi_et_al_2018 
Methodology 

Replicates ChIP-seq (H3K4me3 and H3K27me3) and input (10% of chromatin used for each ChIP reaction) libraries were generated for 


2 young and 3 old independent fibroblast cultures. 


Sequencing depth ChIP and input libraries were sequenced on Illumina HiSeq 2000 platform (single-end 50bp reads) Table below indicated the 
number of reads acquired for each sample and quality-control measures. 


Sample Molecule Reads Read Length FIXSEQ unique reads PCR dulplication rate (FIXSEQ) MACS2 peaks (FDR <1%) Peaks with 
>5 fold enrichment (MACS2) 

Old_Fib_culture4_H3K27me3 H3K27me3 43662273 50 27670751 29.28 48741 29084 

Old_Fib_culture4_H3K4me3 H3K4me3 21847330 50 14813404 17.26 25543 24571 


Old_Fib_culture4_INPUT INPUT 20848880 50 16879828 1.5 NA NA 
Old_Fib_culture5_H3K27me3 H3K27me3 49256681 50 35092468 20.82 100354 68123 
Old_Fib_culture5_H3K4me3 H3K4me3 23314898 50 17070330 10.76 24556 23557 
Old_Fib_culture5_INPUT INPUT 21550022 50 17799159 1.29 NA NA 
Old_Fib_culture6_H3K27me3 H3K27me3 50766024 50 36941549 19.12 79715 52861 
Old_Fib_culture6_H3K4me3 H3K4me3 23894584 50 16569260 16.34 24882 24162 
Old_Fib_culture6_INPUT INPUT 20205828 50 15692624 7.18 NA NA 
Young_Fib_culture4_H3K27me3 H3K27me3 49198534 50 34228562 22.21 100087 66305 
Young_Fib_culture4_H3K4me3 H3K4me3 23664469 50 13669641 29.49 23354 22605 
Young_Fib_culture4_INPUT INPUT 20248064 50 15203520 7.93 NA NA 
Young_Fib_culture5_H3K27me3 H3K27me3 39027396 50 23785464 31.32 25087 29084 
Young_Fib_culture5_H3K4me3 H3K4me3 21376578 50 12492647 31.72 21843 21373 
Young_Fib_culture5_INPUT INPUT 18897473 50 15341132 1.32 NA NA 


ps 


Antibodies Antibodies used for ChIP experiments: H3K4me3 (Active Motif, #39159), H3K27me3 (Active Motif, #39536). 


Peak calling parameters Trimmed reads were mapped to the mm9 genome assembly using bowtie vO.12.7. Duplicate reads were eliminated using 
the FIXSEQ software with default parameters ChIP-seq peaks were called in all samples using the MACS (v2.08) software 
with default settings and the “--broad” option. Input datasets were used as baseline. 
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Data quality For analysis, Fastq reads were quality-trimmed using the trim-galore software (v0.2.1), with a Phred score threshold of 15, 
and a minimum remaining read length of 36bp. The number of peaks for each sample at FDR 0.01, and above 5-fold 
enrichment is indicated in the table above. 


Software The following packages were used for analyses: trim-galore software (v0.2.1), bowtie (vO.12.7), FIXSEQ, and MACS (v2.08). 


Flow Cytometry 


Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 


Sample preparation For FACS quantification and sorting of fibroblast subpopulations from primary fibroblast cultures 
We performed FACS analysis and sorting of THY1+/ PDGFRa+ and THY1-/ PDGFRa+ cells from primary fibroblast cultures at 
passage 3. Cells were dissociated and resuspended in FACS buffer (1% BSA, 1mM EDTA in PBS), and stained for the cell surface 
markers PE-CD140a (BioLegend, #APAS) and FITC-CD90.2 (BioLegend, #30-H12) for 30 minutes on ice. For EdU incorporation 
experiments, fibroblast cultures were assessed by FACS using the Click-iT EdU Plus FACS PacBlue Kit (Invitrogen, #C10636) in 
accordance with manufacturer’s instructions. Briefly, fibroblasts were incubated in media containing 5-ethynyl-2'-deoxyuridine 
(EdU; 10 uM) for 4 hours. Cells were then dissociated and resuspended in FACS buffer (1% BSA in PBS). Cell surface markers were 
stained with PE-CD140a (BioLegend, #APA5) and FITC-CD90.2 (BioLegend, #30-H12). Cells were then fixed (4% 
paraformaldehyde, PBS) and permeabilized, followed by click reaction to detect EdU, according to the manufacturer's 
instructions. FACS analysis was performed using a on a LSR II flow cytometer (BD Biosciences), and FACS sorting was performed 
ona BD FACS Aria II sorter, using a 100 um nozzle at 13.1 PSI. 


For FACS quantification and sorting of fibroblasts from fresh tissues 

We isolated fibroblasts from the ears of young and old mice for FACS for quantification and for transcriptomic analysis at 
population level. Ears of 2-3 mice were dissected and pooled for each sample, cut into small fragments (~1 mm2), and then 
digested in Dulbecco’s Modified Eagle Medium (DMEM, Invitrogen, #11965-092) supplemented with 0.14 Wunsch units/mL of 
Liberase DL (Roche, # 5401160001) for 30 minutes at 37°C. The fragments were washed with DMEM supplemented with 20% 
fetal bovine serum (FBS, Gibco, #16000-044), funneled through a 100 um nylon mesh (Fisher Scientific, #08-771-19), and washed 
with fibroblast growth medium (DMEM supplemented with 10% FBS and 1% PSQ). A second filtering was performed using a 
40um nylon mesh (Fisher Scientific, #08-771-1), followed by a washing step with fibroblast growth medium. Finally, cells were 
washed with FACS buffer (PBS, 1% BSA, 500nM EDTA), and then re-suspended in FACS buffer and ready to be stained for FACS 
analysis. For in vivo FACS analysis and sorting the following antibodies were used: CD140a (BioLegend, #APAS), CD90.2 
(BioLegend, #30-H12), TER119 (BioLegend, #116234), CD326 (Thermo Fisher Scientific, #50-163-76), CD45 (BioLegend, 
#103126), CD31 (BioLegend, #102422), CD202b (Thermo Fisher Scientific, #15-5987-82), Brilliant Violet 421 Streptavidin 
(BioLegend, #405226) and DAPI staining solution (Thermo Fisher Scientific, # 62248). 


For Single cell RNA-seq of young and old wounds using 10x Genomics Chromium single-cell 

Single cell RNA-sequencing was performed on all live PDGFRa+ Lin- (TER119-/CD326-/CD202b-/CD45-/CD31-) cells in the 
wounded area from young and old mice, 7 days after wounding. To this end, we pooled cells from 10 young (3-4 months), or 10 
old (25-27 months) male C57BL/6 mouse from the NIA aged colony. FACS sorting was performed as described above, with the 
exception that all PDGFRa+ Lin- (TER119-/CD326-/CD202b-/CD45-/CD31-) cells were sorted into fibroblast growth media. Cells 
were then spun down at 300xg for 5 minutes at 40C and resuspended in fibroblast growth media at a concentration of 300 cells/ 
uL. In total, 18,000 young cells and 18,000 old cells were loaded onto a respective 10x Genomics Chromium chip per 


manufacturer’s recommendations. 


For Single cell RNA-seq of fast-healing and slow-healing old wounds using 10x Genomics Chromium single-cell 

Single cell RNA-sequencing was performed on all live cells in the wounded area from 2 fast-healing and 2 slow-healing old mice, 
7 days after wounding. Live/dead staining was performed using 1ug/mL propidium iodide (BioLegend). FACS sorting was 
performed on a BD FACS Aria Fusion sorter using a 100um nozzle. Cells were sorted into chilled fibroblast growth media. Cells 
were then spun down at 300xg for 5 min at 40C and resuspended in fibroblast growth media at a concentration of 1000-1500 
cells/uL. Cells were loaded onto a 10x Genomics Chromium chip as described above. 


Instrument FACS analysis was performed on an LSR II flow cytometer (BD Biosciences), and FACS sorting was performed on a BD FACS Aria II 
sorter or a BD FACS Fusion sorter, using a 100 um nozzle. All instruments were housed in the Stanford Shared FACS Facility. 


Software All flow cytometry data was analyzed using FlowJo version 10.0.7. 


Cell population abundance _ To determine the purity of the primary fibroblasts from young and old mice, FACS analysis was performed on fibroblast cultures 
at passage 3. Cultures were stained for PDGFRa (a general fibroblast marker), in combination with marker genes for possible 
contaminants, including B and T cells, granulocytes, monocytes, and dendritic cells. Our FACS analysis of fibroblasts from fresh 
tissues gates out possible contaminants, including immune cells (CD45), endothelial cells (CD31, TIE2), epithelial (EDCAM) and 
red blood cells (TER119), and gates for PDGFRa+ cells that are either THY1+ or THY1-. In the context of single-cell RNAseq from 
young and old wounds, we gated for CD45-/CD31-/TIE2-/EpCAM-/TER119/PDGFRa+ cells. 


Gating strategy Gating was determined using fluorescent-minus-one controls for each color used in each FACS experiment to ensure that 
positive populations were solely associated with the antibody for that specific marker (see Extended Data Fig. 10). 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Although glucose-sensing neurons were identified more than 50 years ago, the 
physiological role of glucose sensing in metazoans remains unclear. Here we identify a 
pair of glucose-sensing neurons with bifurcated axons in the brain of Drosophila. One 
axon branch projects to insulin-producing cells to trigger the release of Drosophila 
insulin-like peptide 2 (dilp2) and the other extends to adipokinetic hormone (AKH)- 
producing cells to inhibit secretion of AKH, the fly analogue of glucagon. These axonal 
branches undergo synaptic remodelling in response to changes in their internal energy 
status. Silencing of these glucose-sensing neurons largely disabled the response of 
insulin-producing cells to glucose and dilp2 secretion, disinhibited AKH secretion in 
corpora cardiaca and caused hyperglycaemia, a hallmark feature of diabetes mellitus. 
We propose that these glucose-sensing neurons maintain glucose homeostasis by 
promoting the secretion of dilp2 and suppressing the release of AKH when 
haemolymph glucose levels are high. 


Glucose-sensing neurons respond to glucose or its metabolites, which 
act as signalling cues to regulate their neuronal activity. According to 
the glucostatic hypothesis proposed in 1953, feeding and related behav- 
iours are regulated by neurons in the brain that sense changes in glucose 
levels in the blood!. Despite the discovery of glucose-sensing neurons 
in the hypothalamus through electrophysiological methods more 
than ten years later’, the physiological role of these neurons remained 
unclear** until recently, when a population of glucose-excited neurons in 
the Drosophila brain were determined to function as an internal nutrient 
sensor to mediate the animal’s consumption of sugar?. A large number of 
glucose-sensing neurons appear to be present in animals°; we speculated 
that these neurons mediate physiological functions that are critical 
for the wellbeing of the animal, including glucose homeostasis. Here 
we report the identification of a pair of glucose-excited neurons in the 
Drosophila brain that maintain glucose homeostasis by coordinating 
the activity of the two key hormones involved in the process: insulin 
and glucagon. 


CN neurons project to the PI and CC 


To identify neurons that respond to sugar on the basis of its nutritional 
value, we used a two-choice assay’ to screen Vienna tiles (VT)-Gal4 
Drosophila lines® that had been crossed to UAS-Kir2.1, tub-Gal80* flies 
(inward-rectifier potassium ion channel allele Kir2.1 with tubulin-tem- 
perature-sensitive Gal80) for defects in their ability to select nutritive 
D-glucose over non-nutritive L-glucose (Extended Data Fig. 1a, see Meth- 
ods). We isolated two independent Gal4 lines, V758471 and VT43147-Gal4, 
that failed to select D-glucose after periods of starvation and appeared 
to contain dorsolateral cells that resemble those that are labelled by 


the corazonin (Crz)-Gal4 line’ (Extended Data Fig. 1b, c, arrowheads). 
Flies in which Crz-Gal4-expressing neurons had been inactivated failed 
to select D-glucose even when starved (Extended Data Fig. 1d). These 
results suggest that the dorsolateral neurons labelled by Crz-Gal4 and 
two candidate Gal4 lines mediate the behavioural response to sugar. 

We used a Crz antibody to confirm the identity of the dorsolateral 
neurons (Fig. 1a, top right). A previous study demonstrated that a subset 
of Crz-expressing neurons also express short neuropeptide F (SNPF)"°. 
Immunolabelling revealed that the dorsolateral neurons expressing 
Crz indeed express sNPF (Fig. 1a, bottom right). On the basis of these 
findings, we named these Crz*sNPF* neurons CN neurons. To restrict 
Gal4 expression to a few cells that include the dorsolateral neurons, 
we crossed V758471-Gal4 to choline acetyltransferase (ChAT)-Gal8s0, 
generating CN-Gal4, which unambiguously labelled a pair of CN neu- 
rons when crossed to UAS-mCD8.:GFP (Fig. 1a, left). Flies in which these 
dorsolateral neurons were inactivated using CN-Gal4 failed to select 
D-glucose when starved (Fig. 1b). Each CN cell body projects an axon 
that bifurcates to form two major branches (Fig. 1a). One branch (axon 1) 
projects to the pars intercerebralis (PI) region of the brain and the other 
branch (axon 2) projects ventrolaterally towards the corpora cardiaca 
(CC)"""” (Fig. la, c, Extended Data Fig. 2a). We used an intersectional 
approach to define these projections further, thereby validating that 
axon Linnervates the Pl and axon 2 projects to the CC (Fig. 1d, Extended 
Data Fig. 2b, c, see Methods). We also used this approach to induce the 
expression of tetanus toxin (TNT)® to silence a pair of CN neurons. These 
flies failed to choose D-glucose even after starvation when CN neurons 
were inactivated (Fig. le, see Methods). This provided further evidence 
of the contribution of the pair of the dorsolateral CN neurons to glucose- 
evoked behaviour. 
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Studies, La Jolla, CA, USA. (Department of Genetics and Development, Columbia University, New York, NY, USA. ‘Department of Biological Sciences, Korea Advanced Institute of Science and 


Technology, Daejeon, South Korea. *e-mail: seongbaesuh@kaist.ac.kr 


Nature | Vol574 | 24 OCTOBER 2019 | 559 


Article 


a 
CN-Gal4 (VT58471-Gal4, ChAT-Gal80) > UAS-mCD8::GFP 
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Fig.1| A pair of glucose-sensing CN neurons in the brain showa unique 
projection pattern. a, CN neurons labelled with GFP (green) counterstained 
with nc82 antibody (magenta) in the brain and CC. CN neurons (arrowheads) 
extend their neurites centrally (Al) and ventrolaterally (A2). Asterisks denote 
unrelated cells labelled by CN-Gal4. Scale bar, 50 tm. CN cell bodies labelled with 
GFP (green) co-stained with Crz (magenta, top right) and sNPF antibodies 
(magenta, bottom right). Scale bar, 5 pm. b, Inactivation of CN neurons by 
expressing UAS-Kir2.1 and tub-Gal80* under the control of CN-Gal4 at 30 °C 
blunts a preference for D-glucose in starved flies. c, CN axons innervate insulin- 
producing cells (IPCs), stained with dilp2 antibody (magenta), viaaxon1(A1, left) 
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and AKH-producing cells, stained with AKH antibody (cyan), viaaxon2 

(A2, right). CN axons and dendrites are stained with GFP (green) and DenMark 
detectable dsRed (magenta, see Methods) antibodies, respectively. Scale bar, 
20 um.d, Intersectional labelling of CN neurons by GFP (green) co-stained with 
AKH antibody (magenta). Scale bar, 50 pm. e, Intersectional silencing of CN 
neurons blunts a preference for D-glucose in starved flies. Images shown are 
z-stacked projections, except ina, right. In all figures, plots show mean +s.e.m. 
***P<(0).001; one-way ANOVA with Tukey post hoc test. Sample sizes and 
statistical analyses are shown in Supplementary Table 1. 


Fig.2|CNneuronsare activated by nutritive 
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Fig.3|IPC activity and dilp2 secretion require an excitatory signal from CN 
neurons. a, b, Representative images (a) and quantification of the number of 
Syt-GFP* puncta (green) (b) inaxon1(A1) of CNneurons in fed, starved or refed 
flies carrying Crz-Gal4 and UAS-Syt::GFP. The outlined region shows CN axonal 
projections to IPCs. c, Native GRASP-induced fluorescence (green, arrowheads), 
co-stained with dilp2 antibody (magenta). d, Average GCaMP traces and AF/F 
(max) quantifications from IPCs of fed or starved flies in which CN neurons were 
stimulated, or those of fed control (Con) flies. e, f, Relative intensities of dilp2 
immunoreactivity in IPCs (e) and tryptic peptide of dilp2 B chain from 
haemolymph (f) of fed flies in which CN neurons were inactivated, or those of 
control flies. g, Average GCaMP traces and AF/F (max) quantifications from IPCs 
of fed flies in which CN neurons were silenced by TNT in response to D-glucose or 
those of control flies. h, IPCs comprise three subpopulations according to their 
response to glucose; see Methods. Scale bars, 20 pm. Images are z-stacked 
projections. **P< 0.01 and ***P< 0.001; one-way ANOVA with Tukey post hoc test 
for (b, d) and unpaired two-tailed t-test for (e, f, g). See Supplementary Table 1 for 
the sample sizes and statistical analyses. AU, arbitrary units. 


CN neurons are glucose-excited 


We next sought to determine whether CN neurons respond to glucose 
and other sugars. Calcium-imaging studies using ex vivo brain prepara- 
tions of flies carrying the calcium indicator UAS-GCaMP6s" and CN-Gal4 
revealed that CN neurons were robustly activated by D-glucose with 
substantial calcium oscillations (Fig. 2a—c, Extended Data Fig. 3a—e). CN 
neurons also responded to D-trehalose and D-fructose, which are found 
inthe haemolymph, but failed to respond to (1) the non-nutritive sugar 
L-glucose (Fig. 2b, d); (2) the non-haemolymph sugar sucrose; and (3) the 
non-sugar nutrients amino acids (Extended Data Fig. 3f-j). D-Glucose 
and D-trehalose are key sugars in the haemolymph, although D-trehalose 
stimulates the activity of CN neurons only after a substantial delay (about 
12 min), possibly because it requires additional metabolic steps to be 
converted to glucose. D-Fructose applied at 20 mM activated CN neu- 
rons, although the concentration of D-fructose in the haemolymph is 
much lower (<2mM)>. These findings suggest that the pair of CNneurons 
responds only to D-glucose under normal physiological conditions. 
We next determined whether activation of CN neurons by D-glu- 
cose requires glucose metabolism inside the cell. Exposing the brain 
to D-glucose mixed with 2DG, phlorizin or nimodipine, which inhib- 
its glycolysis, glucose transport or voltage-gated calcium channels, 
respectively, blunted the glucose-induced stimulation of CN neurons 
(Fig. 2b, d). In the presence of pyruvate (an end product of glycolysis), 
the CN neurons demonstrated activity similar to that seen in the pres- 
ence of other haemolymph sugars (Extended Data Fig. 3f-j). Application 
of the ATP-sensitive potassium channel (K,;p)!° blocker glibenclamide” 


resulted in activation of CN neurons (Fig. 2b, c). Furthermore, glucose- 
induced calcium transients of these neurons were not abrogated by the 
application of the sodium-channel blocker tetrodotoxin (TTX) (Fig. 2b). 
Using RNA-mediated interference (RNAi) lines, we also determined that 
glucose transporter 1 (Glut1), hexokinase C (Hex-C), asubunit of the K,;p 
channel (SURI) and the voltage-gated calcium channel are required in CN 
neurons for the two-choice behaviour (Extended Data Fig. 4a—c). Consist- 
ent with the behavioural results, the glucose-induced calcium response 
of CNneurons requires Glut, SUR1 and a voltage-gated calcium channel 
(Extended Data Fig. 4d-h), further supporting the role of the intracel- 
lular glucose metabolic pathway in stimulating CN neuronal activity. 

We next used the calcium-dependent nuclear import of LexA (CaL- 
exA) system’ to measure cellular activity in CN neurons in intact flies, 
and found that GFP signal driven by the CaLexA system in starved flies 
was significantly reduced compared to the signal in fed flies, and the 
signal was restored when starved flies were refed D-glucose (Fig. 2e-g, 
Extended Data Fig. 4i, j). These results suggest that the activity of CN 
neurons is stimulated by the increase in glucose levels observed under 
fed conditions. In addition to the altered CaLexA signals, we evaluated 
the effect of glucose on the number and intensity of synaptotagmin 
(Syt)!°-GPF* punctain fed, starved and refed animals. The Syt-GPF* sig- 
nals decreased significantly in axon 1in starved animals and returned to 
normal levels after the flies were fed with D-glucose (Fig. 3a, b, Extended 
Data Fig. 5a, c). However, this nutrient-dependent plasticity was not 
observed in Crz-Gal4-labelled axonal processes that did not originate 
from the dorsolateral CN neurons (Extended Data Fig. 5e-h). 


CN neurons promote dilp2 release 


We next sought to determine whether CN neurons are coupled with 
IPCs” at the synaptic level. We used a modified GFP reconstitution 
across synaptic partners (GRASP) method”, and found that the GRASP 
signals were visible around the synapse between CN neurons and insulin- 
producing cells (IPCs) (Fig. 3c), indicating physical coupling between 
CN neurons and IPCs. 

To determine whether the coupling between CN neurons and IPCs 
is functional, we expressed ATP-gated P2X, purine receptors” in CN 
neurons and the calcium indicator GCaMPé6s"™ in IPCs, and then stimu- 
lated CN neurons using ATP while recording from the IPCs. As shownin 
Fig. 3d, ATP-induced CN-neuron activity was accompanied by a signifi- 
cant increase in the amplitude of GCaMP signals in the IPCs in fed flies; 
this effect was reduced in starved flies (Fig. 3d, Extended Data Fig. 6a, 
b). This finding supports the hypothesis that the nutrient-dependent 
synaptic changes observed between CN neurons and IPCs have func- 
tional consequences. The CN neurons did not appear to be function- 
ally coupled to glucose-excited diuretic hormone 44 (Dh44) neurons? 
(Extended Data Fig. 6c-f). Furthermore, we investigated whether CN 
neuronal activity is required for dilp2 secretion from IPCs. We observed 
a significant reduction in the intensity of dilp2 immunoreactivity in 
the IPCs of fed control flies, but not in fed flies in which CN neurons 
had been inactivated (Fig. 3e). These results suggest that an excitatory 
signal from the CN neurons contributes to the secretion of dilp2 from 
IPCs in response to increased glucose levels. Using mass spectrometry 
and dot blot assay, we further validated that the flies carrying CN-Gal4 
and UAS-Kir2.1 had lower dilp2 levels circulating in the haemolymph 
than wild-type flies, in contrast to the higher dilp2 levels found in IPCs 
(Fig. 3f, Extended Data Fig. 7a-c, e, f). 

To further clarify the role of CN neurons in mediating glucose-evoked 
activity in IPCs, we inactivated CN neurons by expressing TNT” and 
then examined the responsiveness of IPCs to glucose. The amplitude 
of calcium signals in IPCs that had been exposed to D-glucose was sig- 
nificantly reduced when the CN neurons were inactivated (Fig. 3g). 
Furthermore, we found that IPCs harbour at least three subpopulations 
of neurons with distinct responses to glucose or K,;p channel blocker 
(Fig. 3h and Extended Data Fig. 8a-f). These findings suggest that CN 
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Fig. 4| AKH retention in AKH-producing cells requires an inhibitory signal 
from CNneurons. a,b, Representatives images stained with GFP and AKH 
antibodies (a) and quantifications of the number of Syt-GFP* puncta (green) 
(b) in axon 2 (A2) of CN neurons in fed, starved or refed flies carrying Crz-Gal4 
and UAS-Syt::GFP. Scale bar, 50 1m. c, Synaptic GRASP-induced GFP signals 
(arrowheads) co-stained with AKH antibody (magenta). Scale bar, 20 pm. 

d,e, Average GCaMP traces and AF/F (min) quantifications (d) and average 
Arclight traces and AF/F (max) quantifications (e) from AKH-producing cells of 


neuronal activity is required for the majority of IPCs to respond to 
glucose. 


CN neurons inhibit AKH secretion from CC 


To determine whether nutrient-dependent plasticity also occurs in axon 
2 of the CNneurons, we monitored the number and intensity of Syt-GFP* 
puncta before and after feeding flies with D-glucose. We observed a signifi- 
cant reductionin these parameters in starved flies and arestoration to nor- 
mal levels after refeeding starved flies with D-glucose (Fig. 4a, b, Extended 
Data Fig. 5b, d). This raised the possibility of coupling between CN neu- 
rons and AKH-producing cells. Using a modified GRASP method”, we 
observed GRASP fluorescent signals around AKH-producing cells (Fig. 4c). 
To determine whether there is any functional connectivity between these 
cells, we activated the CN neurons while monitoring the activity of AKH- 
producing cells and found that calcium transients in the AKH-producing 
cells appeared to decrease during activation of CN neurons (Fig. 4d). 

To probe this observation further, we expressed the Arclight recep- 
tor”, which increases fluorescent signals when cells become hyperpolar- 
ized, in AKH-producing cells and P2X, receptors in CN neurons. When the 
CNneurons were activated using ATP, the Arclight fluorescence intensity 
in fed flies increased significantly compared with that in starved flies 
(Fig. 4e), validating the occurrence of nutrient-dependent changes inthe 
synapses between CN neurons and AKH-producing cells. Notably, when 
CN neurons were inactivated, the intracellular AKH levels decreased 
significantly compared with controls (Fig. 4f). Using mass spectrometry 
and dot blot assay, we confirmed significantly higher levels of AKH in 
haemolymph of flies carrying CN-Gal4 and UAS-Kir2.1 compared with 
those in control flies (Fig. 4g, Extended Data Fig. 7d, g, hh). These findings 
suggest that CN neuronal activity inhibits the release of AKH from the 
CC and the increase of AKH levels in haemolymph. 


sNPF is the functional neurotransmitter 


We next investigated the identities of the key neurotransmitters in axon 
land axon 2 for regulating the functionally opposing synaptic activities. 
We tested the role of Crz and sNPF in the two-choice behaviour using 
RNAilines and found that sNPF in CN neurons and sNPF receptor inthe 
postsynaptic IPCs, but not Crz or its receptor, are important (Fig. 5a, 
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fed or starved flies in which CN neurons were activated or those of control flies. 
f, g, Relative intensities of AKH immunoreactivity in AKH-producing cells (f) and 
tryptic peptide of AKH from haemolymph (g) of fed flies in which CN neurons 
were inactivated or those of control flies. Scale bar, 20 pm. Images are z-stacked 
projections except inc. *P< 0.05, **P<0.01and ***P< 0.001; one-way ANOVA 
with Tukey post hoc test (b, e, f) and unpaired two-tailed t-test (d, g). Sample 
sizes and statistical analyses are shownin Supplementary Table 1. 


b, Extended Data Fig. 9a-d). We also found that sNPF but not Crz lev- 
els in CN neurons were significantly reduced when CN neurons were 
exposed to D-glucose (Fig. 5c, Extended Data Fig. 9e). We observed 
that approximately a half of the IPCs that had responded to glucose 
failed to respond glucose when the dominant-negative allele of sVPF 
receptor” was expressed in IPCs (Fig. 5d and Extended Data Fig. 8g-i). 
Furthermore, we observed that intracellular AKH levels remained high 
in AKH-producing cells in fed control flies, but declined significantly in 
fed flies in which the function of sNPF receptor® was inhibited in AKH- 
producing cells (Fig. 5e). 

Finally, we determined whether sNPF alters activity of IPCs and/or 
the CC. The activity of IPCs was significantly stimulated by the appli- 
cation of sNPF”° (Extended Data Fig. 10a, b), whereas CC activity was 
significantly inhibited by sNPF (Fig. 5f). These functionally opposing 
effects of sSNPF are probably mediated by G, in IPCs and by G,,, in AKH- 
producing cells via the sNPF receptor, which is a G-protein-coupled 
receptor”. Exposing the brain to U73122, a PLC inhibitor that inhibits the 
G, pathway, eliminated the glucose-evoked activation of IPCs, but had 
no effect on sNPF-induced inhibition of AKH-expressing cells (Fig. 5f, g). 
Conversely, exposing the brain to pertussis toxin, a G, inhibitor, blunted 
the sNPF-induced inhibition of AKH-producing cells, but had no effect on 
the glucose-evoked activation of IPCs (Fig. 5f, g). These results indicate 
that axon 1and axon 2 can have opposing synaptic activities througha 
mechanism involving the same neurotransmitter and receptor but with 
distinct downstream factors coupled with opposing outputs. 

To determine whether CN neuronal activity can alter circulating 
sugar levels in flies, we monitored circulating concentrations of glucose 
and trehalose in haemolymph, and found that they were significantly 
increased in flies in which CN neurons were inactivated compared with 
controls (Fig. 5h, Extended Data Fig. 10c). This finding illustrates that 
dysfunctional CN neuronal input to IPCs and AKH-producing cells results 
ina defect in glucose homeostasis. 


Discussion 

We identified and characterized a pair of glucose-sensing neurons in 
the Drosophila brain that have an essential role in maintaining glucose 
homeostasis. This was achieved by counterbalancing the activities of 
Drosophila equivalents of insulin- and glucagon-producing cells. When 
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Fig. 5|sSNPF is the functional neurotransmitter of CN neurons. a, Knockdown 
of sNPF in flies carrying Crz-Gal4 and UAS-sNPF RNAi abolishes a preference for 
D-glucose in starved flies. b, Expression of the dominant-negative sNPF receptor 
(UAS-sNPFR-DN) using dilp2-Gal4 blunts a preference for D-glucose in starved 
flies. c, Immunoreactivity of intracellular SNPF in CN neurons when the brains 
were incubated in 80 mM sucrose, D-glucose, D-glucose mixed with 0.5 uMTTX 
(D-Glc + TTX) or L-glucose in artificial hoaemolymph-like solution (AHL). Scale 
bar, 5 um.d, Average GCaMP traces and AF/F (max) quantifications from IPCs of 
fed flies in which UAS-sNPFR-DN was expressed in IPCs in response to D-glucose, 
or those of control flies. e, Immunoreactivity of intracellular AKH in CC of flies 
carrying UAS-sNPFR-DN and AKH-Gal4, or those of control flies. Scale bar, 20 tm. 


food consumption leads to arise in haemolymph sugar levels, CN neurons 
excite the IPCs through sNPF and its receptor, which appear to be coupled 
to the G, signalling cascade to induce the secretion of dilp2, while sup- 
pressing the release of AKH by using the same sNPF receptor, whichin this 
case is coupled with G, signalling pathway (Fig. Si, Extended Data Fig. 10d). 
We speculate that precise control of these opposing functions is facilitated 
because the nutrient-dependent plastic changes arise froma single cell. 

This study demonstrates how the activity of the two key endocrine 
systems is coordinated in metazoans and that their coordination is under 
the direct control of glucose-sensing neurons. Such coordination has 
been proposed to occur inmammals via the sympathetic and parasym- 
pathetic nerves that connect the pancreatic islets with glucose-sensing 
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f, Average Arclight traces and AF/F (max) quantifications from CC in response to 
sNPF mixed with G; inhibitor, pertussis toxin (PTX), G, (phospholipase C, PLC) 
inhibitor U73122 or DMSO-only controls. g, Average GCaMP traces and AF/F 
(max) quantifications from IPCs in response to D-glucose mixed with U73122, 
PTX or U73343, anon-functional enantiomer of U73122; see Methods. h, 
Circulating glucose levels of flies in which CN neurons were silenced, or thosein 
control flies. i, Schematics of the functional connectivity among CNneurons, 
IPCs and AKH-producing cells. Images are z-stacked projections. *P< 0.05, 
**P<0.0land ***P< 0.001; one-way ANOVA with Tukey post hoc test and 
unpaired two-tailed t-test ind. Sample sizes and statistical analyses are shownin 
Supplementary Table1. 


neurons in the hypothalamus and hindbrain”®. The finding that a large 
proportion of IPCs respond to glucose through CN neurons in insects 
raises an intriguing possibility that both direct and indirect mechanisms 
control endocrine function in mammals”*””. Finally, this work may shed 
light on the function of glucose-sensing neurons. Further research is 
needed to understand how these regulatory processes are affected by 
excessive nutrition and other metabolic disturbances, including obesity. 
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Methods 


Fly strains 

Flies were raised on standard cornmeal-molasses food (940 ml water, 
9 g agar, 15 g yeast, 36 g cornmeal, 36 ml molasses, 1.12 g tegosept, 3.8 
ml propionic acid, total 1] of fly food) at 23 °C with 12 h:12 h light:dark 
cycles. Fly strains were obtained as described in Supplementary Table 2. 
All the lines used for behavioural testing were backcrossed into w”8 
(Bloomington no. 6326) background for at least five generations. 


Two-choice assay 

The two-choice preference assay was performed as previously 
described’. In brief, approximately 30-40 male flies (1-3 days old) were 
collected under CO, anaesthesia and allowed to recover for 2-3 more 
days. The flies were starved in an empty vial with wetted Kimwipe with 
2-3 ml of distilled water for 5 h (fed condition) or 24 h (starved condi- 
tion). They were introduced into atwo-choice arena containing two food 
sources, 50 mM D-glucose and 200 mM L-glucose, that were colored 
with a tasteless food dye (1% of red food dye and 0.7% of green food 
dye, McCORMICK) where flies were allowed to feed for 2 h in the dark. 
The majority of flies preferred sweeter L-glucose (200 mM) when fed, 
but chose nutritive D-glucose (50 mM) when starved>”. For experiments 
using UAS-Kir2.1, tub-Gal80*, flies were raised at 18 °C before transfer- 
ring to 30 °C for 48-72 h to inactivate Gal80 in order to express Kir2.1. 
Following behavioural testing was conducted at 23 °C. For experiments 
using flies that do not bear tubulin-Gal80*, behavioural testing was 
conducted at 23 °C. All sugars (D-glucose, D-trehalose and D-fructose) 
were purchased from Sigma except L-glucose (Carbosynth). Statistical 
analyses were performed using GraphPad Prism 8.1.1. Food preference 
was scored as a percentage of preference index (% PI), calculated as 
shown below (where f, is the number of flies that ate food n): 


(f +O.5x(f +6))-G +0.5* Gf +6) 
fth, 


% Pl= x 100% 


Two-choice behavioural screen using VT-Gal4 drives 

We obtained more than 100 Gal4 drivers from the Vienna Drosophila 
RNAi Center (VDRC, Vienna Tiles (VT) library)®. We used the two-choice 
assay to screen 95 VT-Gal4 drivers, which had been crossed to UAS-Kir2.1, 
tub-Gal8O* (inward-rectifier potassium ion channel allele, Kir2.1, with 
tubulin-temperature-sensitive Gal80). We found 18 candidate Gal4 driv- 
ers that preferred L-glucose even when they were starved, and inspected 
the expression pattern of each Gal4 driver. Among those candidates, 
we focused on two Gal4 drivers, VT58471 and VT43147, because a pair 
of neurons in the dorsal-lateral area of the brain overlapped clearly in 
the two candidate Gal4 drivers. 


Intersectional approach to define CN neurons at single-cell 
resolution 

To define CN neurons ina single cell resolution, we combined LexA- 
driven flippase (LexAop-FLP) with UAS-FRT-stop-FRT-effector to limit the 
expression of effector in the areas in which Gal4 and LexA lines overlap. 
This induced the expression of UAS-FRT-stop-FRT-smGFP”, under the con- 
trol of CN-Gal4 and another independent R20F11-LexA line. RZOF11-LexA 
also labels CN neurons and exhibited a defect when crossed to LexAop- 
TNT in the two-choice assay. We also used this approach to induce the 
expression of UAS-FRT-stop-FRT-TNT, under the control of CN-Gal4 and 
R20F11-LexA, to silence a pair of CN neurons. 


Immunostaining 

Immunohistochemistry of the brain, ventral nerve cord (VNC), CC and 
foregut was conducted as previously described’. In brief, the brains were 
dissected in PBS (1.86 mM NaH2P04, 8.41 mM Na2HPO4, 175 mM NaCl) 


and fixed in 4% paraformaldehyde (PFA)/PBS for 30 min at 23 °C. After 
washing in PBST (PBS + 0.3% Triton X-100, Invitrogen) (3 times, 10 min 
each), the brains were blocked in 10% normal goat serum (NGS, Jackson 
Immunoresearch, T-005-000-121) in PBST for 1h at 23 °C, and thenincu- 
bated with primary antibodies for 12-48 h at 4 °C. After washing in PBST 
(3times, 10 min each), the sample brains were incubated with secondary 
antibodies for 12-24 h at 4 °C and washed again using PBST (3 times, 10 
mineach). Primary antibodies that were used: chicken anti-GFP (1:500; 
Invitrogen, A10262), rabbit anti-GFP (1:500; Invitrogen, A-11122), mouse 
anti-GFP (1:100; Sigma, G6539, use for synaptobrevin (Syb)-GRASP), 
mouse anti-nc82 (1:25; Development Studies Hybridoma Bank, DSHB, 
AB-2314866), rabbit anti-dsRed (1:500; Clontech, 632496), rabbit anti- 
corazonin (Crz) (1:500; a gift from). Veenstra, Université de Bordeaux), 
rabbit anti-sNPF (1:500; a gift fromJ. Veenstra, Université de Bordeaux, 
France), rabbit anti-dilp2 (1:500; a gift from E. Hafen, Institute for Molec- 
ular Systems Biology, Ziirich) and rabbit anti-AKH (1:500; gifts from). 
H. Park, University of Tennessee and S. K. Kim, Stanford University) 
antibodies. Secondary antibodies that were used: Alexa Fluor 633 goat 
anti-rabbit IgG (1:500; Invitrogen, A-21070), Alexa Fluor 555 goat anti- 
mouse IgG (1:500; Invitrogen, A-21127), Alexa Fluor 555 goat anti-rabbit 
IgG (1:500; Invitrogen, A27039), Alexa Fluor 488 goat anti-rabbit IgG 
(1:500; Invitrogen, A27034), Alexa Fluor 488 goat anti-mouse IgG (1:500; 
Invitrogen) and Alexa Fluor 488 goat anti-chicken IgG (1:500; Invitro- 
gen, A28175). We used Vectashield (Vector Labs, H1000) for mounting 
the samples. All Images were acquired using a Zeiss LSM 800 confocal 
microscope (Zeiss) with 25x lens at 1,024 x 1,024 resolution. Z-stacked 
images were constructed using ZEN image analysing software (Carl Zeiss, 
ZEN 2.3 SP1FP1, v.14.0.12.201). Quantifications and statistical analyses 
were conducted using ImageJ and GraphPad Prism 8.1.1, respectively. 


Ex vivo GCaMP and Arclight imaging 
Ex vivo calcium imaging using GCaMP¢6s was performedas described’. 
Adult male fly brains or CC cells in the foregut were dissected with 
AHL (108 mM NaCl, 8.2mM MgCl,,4 mM NaHCO,,1mM NaH,PO,,2mM 
CaCl,,5 mM KCI,5mM HEPES, appropriate amount of sucrose to balance 
osmolarity, the pH was adjusted to 7.3 with 1 M NaOH) and were immobi- 
lized by tissue holder (Warner Instruments) ona Sylgard-based perfusion 
chamber. Sugars (D-glucose, D-trehalose, D-fructose and sucrose), 2DG, 
phlorizin, pyruvate and L-essential amino acids (EAAs) (1x L-(10)-EAAs: 
0.6 mM L-Arg, 0.2 mM L-His, 0.4 mM L-Ile, 0.4 mM L-Leu, 0.4 mM L-Lys, 
0.1mM L-Met, 0.2 mM _L-Phe, 0.4 mM L-Thr, 0.05 mM L-Trp and 0.4 mM 
L-Val) were purchased from Sigma. TTX, glibenclamide and nimodipine 
were purchased from Tocris. We mixed 20 mM D-glucose containing 
AHL with 0.5 uM TTX, 20 mM 2DG, 1 mM phlorizin or 5 uM nimodipine. 
The concentrations of TTX, glibenclamide, 2DG, phlorizin, nimodipine, 
sucrose, pyruvate and L-essential amino acids (L-(10)-EAAs) that were 
used in the experiments were determined based on the pilot experi- 
ments. Ex vivo Arclight imaging was performed as previously described 
with a modification”. Each brain was recorded for 200-500 frames in 
total (512 x 512 pixels; each frame, 5s). After imaging, the condition of 
the cells was checked by treating 80 mM KCI contained AHL solution. 
Exchange of different solutions was automatically controlled by a Valve- 
Bank controller (AutoMate Scientific). Changes in fluorescent intensity 
were recorded using a Prairie two-photon microscope and Prairie view 
software v.4.3.2.18 (Prairie Technologies Inc.) with a 40 water immer- 
sion lens (Olympus). All Image analyses were conducted using ImageJ. 
Aregion of interest (ROI) was centred using StackReg plugin in Image]. 
Peak amplitude (% AF/F) was obtained by subtracting the ampli- 
tude of pre-stimulation baseline (average of 30 frames, 1 frame = 5 
s) from the stimulation-evoked peak amplitude. Oscillation number 
refers to the total number of calcium oscillations during stimulation. 
Duration is the length of calcium response during stimulation. Oscil- 
lation frequency is a ratio of oscillation numbers/duration. Max AF/F 
(%) = ((Frnax ~ Fo)/Fo) X 100; Fina, Maximum fluorescence observed during 
stimulation; Fy, average fluorescence of 30 baseline slides. Min AF/F 
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(%) = ((Finin ~ Fo)/Fo) X 100; Fin, Minimum fluorescence observed during 
stimulation. Max AF/F (%) was used when we calculated the increasing 
amount of fluorescent signal of cells during stimulation, and minimum 
AF/F (%) was used when we calculated the decreasing amount of fluo- 
rescent signal of cells during stimulation. Min AF/F (%) was used only in 
Fig. 4d. Statistical analyses were conducted with GraphPad Prism 8.1.1. 


In vivo GCaMP imaging 

In vivo calcium imaging using GCaMP6s was modified from the previous 
study**. Adult male flies were collected (1-2 days old) and recovered in 
standard fly food for two more days. They were then fixed on the Sylgard- 
based perfusion chamber using Epoxy resin (Devcon) and their head 
cuticles were removed to make a window in order to conduct in vivo 
calcium imaging. Other procedures are similar with those of ex vivo 
calcium imaging. 


CaLexA measurement 

Adult male flies carrying Crz-Gal4 (or CN-Gal4) and UAS-CaLexA (UAS- 
mLexA-VP16-NFAT, LexAop-GFP) were collected 1-2 days after eclosion 
and recovered for 2-3 more days in standard fly food at 23 °C. A group 
of flies were dissected and their brains were fixed with 4% PFA in PBS for 
20 min without antibody staining (fed flies). Another group of flies after 
24h of starvation (wet starvation) were dissected and their brains were 
fixed similarly (starved flies). We refed some of the starved flies with 
200 mM D-glucose (with 1% agar) for 24 h and their brains were fixed 
similarly (refed flies). Native CaLexA-driven GFP signals were captured 
using confocal microscopy (LSM800, Zeiss, 25X lens). The brains were 
co-stained with anti-Crz antibody. The intensity of CaLexA-driven GFP 
signal was normalized with the intensity of anti-Crz staining in CN neu- 
rons. The CaLexA-driven GFP signal was measured and normalized using 
corrected total fluorescence (CTF) method™, calculated by subtracting 
the background endogenous fluorescence from the integrated GFP 
signal of an area of interest. Z-stacked images were constructed using 
ZEN software (Zeiss) and analysed using Image]. 


Measurement of nutrient-dependent plasticity 

To measure nutrient-dependent plasticity of CN axons and presynap- 
tic terminals, adult male flies carrying Crz-Gal4 (or CN-Gal4) and UAS- 
Syt::GFP were collected 1-2 days after eclosion and recovered for 2-3 
more days in standard fly food at 23 °C. A group of flies were dissected 
and their brains were stained using anti-GFP antibody (fed flies). Another 
group of flies were dissected after 24 h of starvation (wet starvation) and 
their brains were stained similarly (starved flies). We refed some of the 
starved flies with 200 mM D-glucose (with 1% agar) for 24 h and their 
brains were stained similarly (refed flies). We measured and quantified 
the number of Syt-GFP* puncta and the length of axons as described 
in previous study*. The intensity of Syt-GFP signal was measured and 
normalized using corrected total fluorescence (CTF) method™, calcu- 
lated by subtracting the background endogenous fluorescence from 
the integrated GFP signal in an area of interest. Images were captured 
by Zeiss LSM 800 confocal microscopy (Zeiss) and were analysed using 
ImageJ. Statistical analyses were conducted with GraphPad Prism 8.1.1. 


GRASP analysis 

In Fig. 3c, experimental flies carrying CN-Gal4, UAS-Neurexin 
(Nrx)::SGFP1-10, RZOF11-LexA and LexAop-CD4.:sGFP11 and control flies 
carrying CN-Gal4, UAS-Neurexin (Nrx)::sGFP1-10 and LexAop-CD4::sGFP11 
were used to examine the physical connectivity between CN neurons and 
IPCs. We expressed one component of the split GFP with the presynap- 
tic marker neurexin in CN neurons and another component in IPCs. In 
Fig. 4c, experimental flies carrying R2ZOF11-LexA, LexAop-Syb::GFP1-10, 
AKH-Gal4 and UAS-CD4::GFP11 and control flies carrying R2ZOF11-LexA, 
LexAop-Syb::GFP1-10 and UAS-CD4::GFP11 were used to examine the 
physical connectivity between CN neurons and AKH-producing cells. 
We expressed one component of the split-GFP with the presynaptic 


marker, synaptobrevin, in the CN neurons and another component in 
the AKH-producing cells. To measure native GRASP-induced GFP signals, 
adult male flies were collected 3-5 days after eclosion and their brains 
were dissected in ice-cold PBS and fixed in 4% PFA/PBS for 20 min at 
23 °C and washed in PBST (3 times, 10 min each). Then the brains were 
mounted in Vectashield medium (Vector Labs, H1000) without antibody 
staining. To measure the synaptobrevin-GRASP”-induced GFP signals, 
adult male flies were collected 3-5 days after eclosion and their brains 
were dissected in PBS and fixed in 4% PFA/PBS for 30 min at 23 °C. These 
brains were then blocked using 10% NGS/PBST (1h at 23 °C), and washed 
in PBST (3 times, 10 min each) before subsequent immunolabelling with 
mouse anti-GFP antibody to probe GFP signals as previously described”. 
The brains were mounted in Vectashield medium (Vector Labs, H1000) 
after completing staining processes. 


Functional connectivity using P2X, system 

Experimental flies carrying dilp2-LexA, LexAop-GCaMPé6s, CN-Gal4 and 
UAS-P2X,and control flies carrying dilp2-LexA, LexAop-GCaMPé6s and 
UAS-P2X,were used to examine the functional connectivity between CN 
neurons and IPCs. Experimental flies harbouring R2OF11-LexA, LexAop- 
P2X,, AKH-Gal4 and UAS-GCaMPé6s and control flies bearing LexAop-P2X,, 
AKH-Gal4 and UAS-GCaMP6s, and experimental flies carrying R2OF11- 
LexA, LexAop-P2X,, AKH-Gal4 and UAS-Arclight and control flies harbour- 
ing LexAop-P2X,, AKH-Gal4 and UAS-Arclight were used to examine the 
functional connectivity between CN neurons and AKH-producing cells. 
To clarify the nutrient dependent changes in the synapses between CN 
neurons and IPCs, and between CN neurons and AKH-producing cells, 
we used experimental flies fed with normal food, 24-h-starved (wet 
starvation) experimental flies, or control flies fed with normal food. We 
applied 2.5 mM ATP (Sigma, A26209) for 50s (10 slides, 5s in each slide) 
in AHL to excite P2X, receptor expressing cells. Changes in fluorescent 
intensity were recorded using a Prairie two-photon microscope witha 
40x water immersion lens (Olympus). Image analyses were conducted 
using ImageJ. A region of interest (ROI) was centred using StackReg 
plugin in Image). Plotting graphs and statistical analyses were conducted 
with GraphPad Prism 8.1.1. 


Dilp2 and AKH secretion assay 

To measure dilp2 secretion/retention, adult male flies (1-3 days old) 
were collected under CO, anaesthesia and allowed to recover for 2-3 
more days in standard fly food. A group of flies were starved with water 
(wet starvation) for 24 h. Their brains were dissected and stained using 
anti-dilp2 antibody. Some of the starved flies were refed with 200 mM 
D-glucose (with 1% agar) for 24 h, and their brains were dissected and 
stained using anti-dilp2 antibody. All the processes were conducted at 
23 °C. To measure AKH secretion and retention, adult male flies (1-3 
days old) were collected under CO, anaesthesia and allowed to recover 
for 2-3 more days in standard fly food. Their brains were dissected and 
stained using anti-AKH antibody as described. The analysis of Z-stacked 
images was performed using ZEN software (Zeiss) and Image]. 


Quantification of dilp2 and AKH by PRM mass spectrometry 

Haemolymph of 500-600 flies that were fed with standard fly food 
was collected and pooled as previously described**. Dilp2 and AKH 
were extracted from the control (UAS-Kir2.1/+) and experimental (UAS- 
Kir2.1/CN-Gal4, CN silenced) samples using two volumes of cold ethanol 
following as a previously described method with modification’; 100 
pl of haemolymph was mixed with 200 ul of ice cold ethanol (Fisher 
Scientific, 90% ethyl alcohol (v/v) with 5% isopropanol and 5% metha- 
nol) and incubated at —20 °C for 30 min, then centrifuged at 15,000g 
for 20 min. Supernatants were carefully removed and dried to a small 
droplet by vacuum centrifugation. Extracts were solubilized in 100 ul 
of 100 mMammonium bicarbonate and disulphide bonds reduced with 
500 mM dithiothreitol to a final concentration of 10 mM DTT for 1h at 
37 °C. Cysteines were alkylated with iodoacetamide (200 mM) to afinal 


concentration of 40 mM at 37 °C for 1hin the dark. Excess iodoaceta- 
mide was quenched by adding DTT to a final concentration of 40 mM 
with incubation for 30 min, inthe dark, at room temperature. Samples 
were each digested with 800 ng to 1 pg of trypsin (Trypsin Gold, Mass 
Spectrometry Grade, Promega) overnight, and after acidification with 
10% formic acid (final concentration of 0.5-1% formic acid), resulting 
peptides were desalted using hand packed reversed phase Empore C18 
Extraction Disks (3M) using a previously described method**. Desalted 
peptides were concentrated toa small droplet by vacuum centrifugation 
and reconstituted in 10 pl 0.1% formic acid in water. 10% of the peptide 
material was used for data-dependent-acquisition (DDA) and 50% used 
for targeted parallel reaction monitoring (PRM) liquid chromatography 
followed by tandem mass spectrometry (LC-MS/MS). A Q Exactive HF 
mass spectrometer was coupled directly to an EASY-nLC 1000 (Thermo 
Fisher Scientific) equipped with a self-packed 75 um x 20 cm reverse 
phase column (ReproSil-Pur C18, 3 1M, Dr. Maisch) for peptide sepa- 
ration. Analytical column temperature was maintained at 50 °C bya 
column oven (Sonation). Peptides were eluted with a3-40% acetonitrile 
gradient over 110 min at a flow rate of 250 nl min”. For DDA, the mass 
spectrometer was operated in DDA mode with survey scans acquired at 
aresolution of 120,000 (at m/z200) over ascan range of 300-1750 m/z. 
Upto 15 most abundant precursors from the survey scan were selected 
with an isolation window of 1.6 Thand fragmented by higher-energy col- 
lisional dissociation with normalized collision energy (NCE) of 27. The 
maximum injection time for the survey and MS/MS scans was 60 ms and 
the ion target value (AGC) for both scan modes was set to 3e6. For PRM 
analysis, 1 full MS scan was acquired at 60,000 resolution followed by 
MS/MS of 9 target precursor m/zloaded as the inclusion list. Each PRM 
target peptide was analysed at resolution 15,000 with isolation window 
of 1.4 Th. The ion target value was set to 5e5. 


Mass spectrometry data processing 

RAW files generated from the DDA experiments were analysed by 
MaxQuant proteomics software” (v.1.5.7.0) using a Drosophila fasta 
(19,694 entries) database. Files from PRM experiments were analysed 
with Thermo Scientific Xcalibur (v.4.1.31.9) software. Layouts contain- 
ing target precursor and fragment masses were created with a mass 
accuracy set to 5 ppm. At specific events when MS1 and MS/MS frag- 
ments were aligned, the intensity of each specific precursor and frag- 
ment were extracted and noted. Additional confirmation of the analysis 
was carried out using Skyline (v.4.10.18169) Proteomics software”. lon 
intensity extraction and alignment of spectra were visually inspected 
and confirmed in each case. Processed intensities of dilp2 B chain, AKH 
and those fragment ions were normalized by the estimated total protein 
level (top 3 precursor intensity of top 99 proteins) inthe haemolymph. 
Normalization of precursor peptide and fragment intensities were done 
by analysing the DDA spectra from each set of replicate control and 
experimental samples by MaxQuant label-free quantitative analysis; 
intensities of top three peptides from proteins found in both sam- 
ples were summed and compared (see Supplementary Table 3). The 
nomenclature for peptide fragmentation during mass spectrometry 
was described in previous studies*!*”. 

Note: We also performed several experiments using methanol to 
extract peptides from the haemolymph*. The dilp2 and AKH quantita- 
tion results from those PRM experiments were consistent with results 
obtained using the ethanol extracted materials presented in this manu- 
script, but contained additional contaminants that interfered with liquid 
chromatography (data not shown). 


Dot blot assay 

Haemolymph of 60-80 flies that were fed with standard fly food was 
collected as previously described®. The total protein in the haemo- 
lymph was measured by Bradford (Bio-Rad) and the concentration 
was adjusted to 1 mg mI in PBS“. 10 pil of haemolymph was dropped 
on 0.2 um nitrocellulose membrane (GE Healthcare) and left at room 


temperature until to be dried. The membrane was then boiled in PBS 
for 3 min and subsequently fixed in 4% PFA in PBS for another 20 min. 
The membrane was blocked with 3% bovine serum albumin (BSA) in 
PBS for 1h at room temperature (21 °C) and then incubated with mouse 
anti-HA (1:500; Covance, 901501) antibody or purified rabbit anti-AKH 
antibody (1:500; a gift from S. K. Kim, Stanford University) in 3% BSA 
in PBS at 4 °C for overnight followed by incubation with horseradhish 
peroxidase-conjugated secondary antibodies in 3% BSA in PBS for 1lhat 
room temperature (21 °C). Pyiip2-dilp2-HA™ line was used for detecting 
dilp2 inthe haemolymph. The membrane was developed using Chemi- 
doc M (Bio-Rad). The intensities of the black dots were considered as the 
amounts of dilp2 and AKH in flies. Ponceau staining was used as load- 
ing control. The quantification and analysis of dot blot results were 
conducted using Fiji software and GraphPad Prism 8.1.1, respectively. 
For gel source data, see Supplementary Fig. 1. 


Measurement of intracellular sNPF and Crz levels in CN 

neurons and identification of CN axons and dendrites 
Intracellular sNPF and Crz levels after incubating the brains with sugars 
were measured as previously described?. The brains of 18-h-starved 
flies in which UAS-mCD8::GFP was expressed in Crz-expressing neurons 
(Crz-Gal4 > UAS-mCD8::GFP) were dissected in cold AHL (sugar free), 
incubated in 80 mM sucrose, 80 mM D-glucose (D-Glc), 80 mM D-glucose 
mixed with 0.5 uM TTX (D-Glc + TTX) or 80 mM L-glucose (L-Glc) con- 
tained AHL for 30 min and then fixed with 4% PFA/PBS and stained with 
anti-sNPF or anti-Crz antibodies as described earlier. All Images were 
acquired using a Zeiss LSM 800 confocal microscope (Zeiss) with 25x 
lens at 1,024 x 1,024 resolution. Z-stacked images were constructed 
using ZEN image analysing software (Zeiss). Quantifications and statis- 
tical analyses were conducted using Image] and GraphPad Prism 8.1.1, 
respectively. To identify the axons and dendrites of CN neurons, we 
expressed UAS-Syt::eGFP under the control of CN-Gal4 line to visualize 
the axons and UAS-DenMark* to visualize the dendrites. We stained 
the brain and gut tissue with anti-GFP antibody to detect the signals 
driven by UAS-Syt::eGFP and anti-dsRed antibody to detect the signals 
driven by UAS-DenMark DenMark (Dendritic Marker, AICAMS-Cherry,a 
hybrid protein of ICAMS (also knownas telencephalin) and mCherry) isa 
specially designed protein to label dendrites. We used a dsRed antibody 
to visualize mCherry. 


Classification of the IPCs into three subpopulations 

When max AF/F (%) of acellis higher than 50% of average max AF/F (%) 
incontrol brains, we counted the cell as a'stimulated' cell. We classified 
the IPCs into stimulated or unresponsive cells after treating D-glucose 
or K,;p channel blocker to the brains of flies in which CN neurons had 
been silenced or in which sNPF receptor in IPCs had been inactivated, 
or control flies. Using this approach, we categorized IPCs into three 
subpopulations. In Fig. 3h, 74.42% of IPCs (32/43 cells) in control flies 
carrying R2OF11-LexA, dilp2-Gal4 and UAS-GCaMP6s responded to 
D-glucose, whereas 25.58% of the IPCs (11/43 cells) failed to respond 
to D-glucose. 20.69% of IPCs (12/58 cells) responded to D-glucose in 
experimental flies carrying R2OF11-LexA, LexAop-TNT, dilp2-Gal4 and 
UAS-GCaMP6s, whereas 53.73% of the IPCs failed to respond to glucose. 
In Extended Data Fig. 8d, 71.43% of IPCs (20/28 cells) in control flies 
carrying R2OF11-LexA, dilp2-Gal4 and UAS-GCaMP6s responded to glib- 
enclamide, whereas 28.57% of the IPCs (8/28 cells) failed to respond to 
glibenclamide. 23.08% of IPCs (9/39 cells) in experimental flies carrying 
R20F11-LexA, LexAop-TNT, dilp2-Gal4 and UAS-GCaMP6s responded to 
glibenclamide, whereas 48.35% of the IPCs failed to respond to gliben- 
clamide. In Extended Data Fig. 8g, 75.68% of IPCs (28/37 cells) incontrol 
flies carrying dilp2-Gal4 and UAS-GCaMPé6s responded to D-glucose, 
whereas 24.32% of the IPCs (9/37 cells) failed to respond. 27.27% of IPCs 
(9/33 cells) responded to D-glucose in experimental flies carrying dilp2- 
Gal4, UAS-sNPFR-DN and UAS-GCaMPé6s, whereas 48.41% of the IPCs 
failed to respond. 
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sNPF neuropeptide treatment to the IPCs and CC 

sNPF neuropeptide was administrated to IPCs and CC using a modified 
method froma previous study”®. sNPF was purchased from Thermo 
Fisher Scientific (sequence: AQRSPSLRLRF, purity >95%, unmodified). 
We made 50 mM stock solution of sNPF in DMSO (Sigma, D2650) and 
keep it at -80 °C. Pertussis toxin (Tocris, 3097), U73122 (Tocris, 1268) or 
U73343 (Tocris, 4133) was mixed with D-glucose or sNPF. These mixtures 
were applied to IPCs in the brain or AKH-producing cells in CC. We mixed 
80 JM sNPF or 20 mM D-glucose contained AHL with 1 ng pl" PTX, 11M 
U73122 or 1 4M U73343 (a non-functional enantiomer of U73122). The 
concentrations of PTX, U73122 and U73343 that were used in the experi- 
ments were determined based on the pilot experiments. 


Haemolymph glycaemia measurement 

Haemolymph glucose and trehalose levels were measured as previously 
described?. In brief, 30-40 flies starved with water for 24 h (starved) or 
fed with standard fly food (fed) were decapitated and their haemolymph 
was collected witha capillary pipette (0.25 pl Microcaps, Drummond). 
0.25 pl of the haemolymph was mixed with 50 pl of Glucose (HK) Assay 
Kit (Sigma, GAHK20) and incubated for 20 min at 23 °C, and then meas- 
ured the absorbance at 340 nm using a Nanodrop spectrophotometer 
(Thermo Scientific). To measure trehalose concentrations in the haemo- 
lymph, pig kidney trehalase (1:500, Sigma, T8778) was added to the 
mixture of Glucose Assay Kit and incubated for 16-20 hat 37 °C, and then 
measured the absorbance at 340 nm. Standard curves (D-glucose and 
D-trehalose) were generated for each trial. Plotting graphs and statistical 
analyses were conducted with GraphPad Prism 8.1.1. 


Statistics and reproducibility 

All statistical analyses were conducted with GraphPad Prism 8.1.1 
(provided by NYU School of Medicine). To compare two normally 
distributed groups, unpaired two-tailed t-tests were used. For mul- 
tiple comparisons between normally distributed groups, one-way 
ANOVAs followed by Tukey’s post hoc test were used (see Supplemen- 
tary Table 1). Without an asterisk means non-significant (P > 0.05). 
No statistical methods were used to predetermine sample size. The 
statistical analyses presented in this manuscript were assisted by a 
statistics expert (X. Li) in NYU. For the two-choice behavioural data 
(Figs. 1b, e, 5a, b, Extended Data Figs. 1a, d, 2c, 4a—c), each data point 
(dot) represents a biological replicate bearing 30-40 male flies for a 
trial. For ex vivo imaging data using GCaMP6s or Arclight (Figs. 2c, d, 
3d, g, 4d, e,5d, f, g, Extended Data Figs. 3b-e, 3g-j, 4e—-h, 6b, 8c, 10b), 
each data point (dot) represents a biological replicate of maximum or 
minimum amplitudes of AF/F (%) during the experimental period. In 
Extended Data Fig. 6c-f, each average trace contains at least three bio- 
logical replicates. For confocal fluorescent images of the brains or CCs 
stained with antibodies (Figs. 2e-g, 3a-c, e, 4a—c, f,5c, e, Extended Data 
Figs. 4i,j,5, 9e), we present the exact number of biological replicates in 
Supplementary Table 1. Moreover, the representative images (Figs. la, 
c, d,2a, 3c, 4c, Extended Data Figs 1c, 2a, b, 9a, b) were independently 
replicated more than five times. For the mass spectrophotometry 
and dot blot assay (Figs. 3f, 4g, Extended Data Fig. 7g-j), each data 
point (dot) or atrial represents a biological replicate bearing 500-600 
flies for mass spectrophotometry and 60-80 flies for dot blot assay. 
For the two-choice behavioural and ex vivo imaging experiments, we 
repeated more than three times based on our pilot experiments. For 
the behavioural experiments, we did not include two or three inde- 
pendent pilot experiments because the experimental setups were 


not clearly controlled at those times. However, they had qualitatively 
similar results; yet, the standard deviations were substantially high. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Raw mass spectrometry files have been deposited in the MassIVE data- 
base; with MassIVE accession ID: MSVO00083796. All other raw data 
are available from the corresponding author on reasonable request. 
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patterns of V758471-Gal4>UAS-mCD8::GFP line, VT43147-Gal4>UAS-mCD8.:GFP ***P < 0.001; one-way ANOVA with Tukey post hoc test. See Supplementary 
line and Crz-Gal4>UAS-mCD8.:GFP line in the brain, the VNC and a part of the Table 1 for the sample sizes and statistical analyses. 
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CN-Gal4 > UAS-Denmark, UAS-Syt-eGFP 


R20F11-LexA > LexAop-GFP 


Extended Data Fig. 2| Axons and dendrites of CN neurons and aLexA line 
that labels a pair of CN neurons. a, Axons and dendrites of CN neurons inthe 
brain visualized by an axonal marker, UAS-Syt::eGFP, and a dendritic marker, UAS- 
DenMark, under the control of CN-Gal4, stained with anti-GFP (green) and anti- 
dsRed (DenMark, magenta, see Methods) antibodies. Arrowheads denote CN 
cell bodies and arrows indicates CN axons (left) and dendrites (middle). b, The 
brain ofa fly carrying R2OF11-LexA and LexAop-GFP, stained with anti-GFP 
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***P< 0.001; one-way ANOVA with Tukey post hoc test. See Supplementary 
Table 1 for the sample sizes and statistical analyses. 
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were applied, the oscillation number (c) and oscillation frequency (e) decreased, 
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channel are required in CN neurons to respond to glucose, and the activity flies in which Glut1, SURI or voltage-gated calcium channel subunit (Ca-a1D) was 
of CN neuronis controlled by the internal energy state in live animals. a, knocked down by RNAi, or those of control flies. i,j, Representative images (i) 
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channel by expressing UAS-SURI RNAi or UAS-Ca-a1D RNAi by CN-Gal4 blunts a See Supplementary Table 1 for the sample sizes and statistical analyses. 
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e-h, Representative images (e) and quantifications (f-h) of the number of Supplementary Table 1 for the sample sizes and statistical analyses. 
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Extended Data Fig. 7| The circulating levels of dilp2 and AKHin 
haemolymph are measured by mass spectrometry and dot blot assay. 

a, Sequences of dilp2, AKH, tryptic peptide of dilp2 B chain and tryptic peptide 
of AKH. We detected dilp2 B chain at m/z (mass to charge ratio): 369.1785 
(TLCSEK, M2H+: 369.1785) and AKH at m/z: 497.2374 (QLTFSPDW, M2H+: 
497.2374). b, Nomenclature and m/z values of fragment ions (N-terminal 
directed ‘a’ and ‘b’ ions, as well as, C-terminal directed ‘y’ ions) whichis driven by 
dilp2 B chain and AKH. c,d, Relative extracted ion intensities for the dilp2 B chain 
and its fragment ions in each trial (c), and AKH and its fragmentions in each trial 
(d) generated from the haemolymph of fed flies in which CN neurons were 
inactivated, or those of control flies; see Methods. e, f, A dot blot (e) andits 


quantification (f) show the levels of dilp2 in the haemolymph of wild type (w“*), 
UAS-Kir2.1/CN-Gal4;dilp2-HA and UAS-Kir2.1/+;dilp2-HA flies, probed with anti- 
HA antibody to detect dilp2. Because w™* flies do not express dilp2-HA, they 
were used as a negative control. g, h, A dot blot (g) and its quantification (h) 
show the levels of AKH in the haemolymph of CN-Gal4/+, UAS-Kir2.1/+ and UAS- 
Kir2.1/CN-Gal4 flies, probed with anti-AKH antibody. The intensity of black dots 
inthe red dashed circle represents the quantity of dilp2 or AKH that was later 
normalized to Ponceau staining. For gel source data, see Supplementary Fig. 1. 
**P<0.0land***P< 0.001; unpaired two-tailed t-test (f) and one-way ANOVA with 
Tukey post hoc test (h). See Supplementary Table 1 for the sample sizes and 
statistical analyses. 
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Extended Data Fig. 8 | Calcium responses to D-glucose or K,;,p channel 
blocker by IPCs of flies in which CN neurons were silenced or SNPF 

receptor was rendered nonfunctional. a, b, Individual traces of IPCsin control 
flies carrying R2OF11-LexA, dilp2-Gal4 and UAS-GCaMP6s (a) and in experimental 
flies carrying R2OF1I-LexA, LexAop-TNT, dilp2-Gal4 and UAS-GCaMP6s (b) 
responding to D-glucose. c, Average GCaMP traces and AF/F (max) 
quantifications from IPCs of fed flies in which CN neurons were inactivated by 
TNT inresponse to K,;p channel blocker glibenclamide, or those of control flies. 
d, IPCs partition into three subpopulations depending on their response to 
glibenclamide; see Methods. e-f, Individual traces of IPCs in control flies (e) and 


in flies in which CN neurons were inactivated by TNT (f) responding to 
glibenclamide. Because a saturating concentration of D-glucose (20 mM) was 
used to quantify the populations of IPCs, we used 100 pM glibenclamide, a 
saturating concentration for glibenclamide according to our control 
experiments. g, IPCs partition into three subpopulations according to their 
response to glucose with or without sNPF receptor; see Methods. h, i, Individual 
traces of IPCs in control flies (h) and in flies in which sNPF receptor was rendered 
non-functional by expressing a dominant negative allele of sNPF receptor in the 
IPCs (i) responding to D-glucose. ***P< 0.001; unpaired two-tailed t-test (c). See 
Supplementary Table 1 for the sample sizes and statistical analyses. 
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Extended Data Fig. 9 | Crzis not required in CN neurons for the two-choice 
behaviour and sugar-evoked Crz secretion. a, b, Expression of GFP inthe 
brains of flies carrying Crz-Gal4, UAS-mCD8::GFP (a) or Crz receptor (CrzR)-Gal4, 
UAS-mCD8::GFP (b). Scale bar, 100 pm. c, d, Knockdown of Crz or Crz receptor in 
flies carrying Crz-Gal4 and UAS-Crz RNAi (c) or CrzR-Gal4 and UAS-CrzR RNAis 
(d), respectively, does not impair the selection of D-glucose in starved flies. e, 
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Immunoreactivity of intracellular Crz in CN neurons, probed with anti-Crz 
antibody, when the brains were incubated in 80 mM sucrose, 80 mM D-glucose 
(D-Glc), 80 mM D-glucose mixed with 0.5 uM TTX (D-Glc/TTX) or 80 MML- 
glucose (L-Glc) in AHL. Scale bar, 5 pm. Images are z-stacked projections. 
One-way ANOVA with Tukey post hoc test (c, e) or unpaired two-tailed t-test (d). 
See Supplementary Table 1 for the sample sizes and statistical analyses. 
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Extended Data Fig. 10 | Activity of IPCs was stimulated by sNPF application, 
and the circulating trehalose level in which CN neurons had been 
inactivated was reduced. a, b, Average GCaMP traces (a) and AF/F (max) 
quantifications (b) from the IPCs in response to 80 1M SNPF in AHL or DMSOin 
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neurons and IPCs (green), and between CN neurons and AKH-producing cells 
(red). CN neurons regulate glucose homeostasis by counter-balancing the 
activities of IPCs and AKH-producing cells through sNPF neurotransmitter that 
activates IPCs and inactivates AKH-producing cells. *P< 0.05, **P<0.0land 
***P < 0.001; unpaired two-tailed t-test (b) and one-way ANOVA with Tukey post 
hoc test (c). See Supplementary Table 1 for the sample sizes and statistical 
analyses. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 


Our web collection on statistics for biologists may be useful. 


Software and code 


Policy information about availability of computer code 


Data collection We did not use any unpublished code to collect the data in this study. We used LSM800 and ZEN software (Carl Zeiss, ZEN 2.3 SP1 FP1, 
version: 14.0.12.201) and Prairie two-photon microscope and its software (Prairie Technologies Inc. Prairie view v4.3.2.18). We also 
conducted mass spectrometry and dot blot assay as described in Methods. We also used Fiji software to collect dot blot images. 


Data analysis We did not use any unpublished code to analyze the data in this study. We used ZEM image analyzing software (Carl Zeiss, ZEN 2.3 SP1 
FP1, version: 14.0.12.201), Graphpad prism 8.1.1, ImageJ 1.52a, Microsoft Excel (Microsoft office professional Plus 2016), and specific 
programs to analyze mass spectrometry data as described in method section (MaxQuant proteomics software v1.5.7.0, Thermo Scientific 
Xcalibur v4.1.31.9, Skyline Proteomics software v4.10.18169). 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


All other raw data are available from the corresponding author on reasonable request. Raw mass spectrometry files have been deposited in the MassIVE database 
(https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp); with MassIVE accession ID: MSV000083796. 


Field-specific reporting 


Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


DX] Life sciences [_] Behavioural & social sciences [| Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No sample size was decided before the experiments. Sample size was determined based on the consistency of measurable differences 
between groups. We followed the previous study (Dus et al.,2015) to determine the statistical methods. We described the number of 
independent replications and sample size in "Statistics and reproducibility" section and Supplementary Table 1. 


Data exclusions — In principle, we did not exclude any data in this study. However some trials were excluded because tested flies stopped moving during the 
behavioral assayes or tested cells were sick or died during the imaging experiments. 


Replication We replicated all the experiments and compared the data at least twice independently. 


Randomization | We tried to randomize the population of flies in the same genotype. We matched the sex and age of flies in every trial. 


Blinding To reduce the bias, investigators checked genotypes after conducting the experiments and data collections. Key experiments were carried by 
multiple authors. 


Reporting for specific materials, systems and methods 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 


Palaeontology 


Animals and other organisms 


Human research participants 


Antibodies 


Antibodies used The primary antibodies used as follows: chicken anti-GFP (1:500; Invitrogen, A10262), rabbit anti-GFP (1:500; Invitrogen, 
A-11122), mouse anti-GFP (1:100; Sigma, G6539, use for synaptobrevin-GRASP), mouse anti-nc82 (1:25; Development Studies 
Hybridoma Bank, DSHB, AB-2314866), rabbit anti-dsRed (1:500; Clontech, 632496), rabbit anti-corazonin (Crz) (1:500; a gift from 
an Veenstra, Université de Bordeaux, France), rabbit anti-sNPF (1:500; a gift from Jan Veenstra, Université de Bordeaux, France), 
rabbit anti-dilp2 (1:500; a gift from Ernst Hafen, Institute for Molecular Systems Biology, Zurich, Switzerland), mouse anti-HA 
(1:500; Covance, 901501), and rabbit anti-AKH (1:500; gifts from Jae H. Park, University of Tennessee, Knoxville, TN, and Seung K. 
im, Stanford University, CA) antibodies. Secondary antibodies used as follows: Alexa Fluor 633 goat anti-rabbit IgG (1:500; 
nvitrogen, A-21070), Alexa Fluor 555 goat anti-mouse IgG (1:500; Invitrogen, A-21127), Alexa Fluor 555 goat anti-rabbit IgG 
(1:500; Invitrogen, A27039), Alexa Fluor 488 goat anti-rabbit IgG (1:500; Invitrogen, A27034), Alexa Fluor 488 goat anti-mouse 
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IgG (1:500; Invitrogen), and Alexa Fluor 488 goat anti-chicken IgG (1:500; Invitrogen, A28175). 


Validation All the primary antibodies used in this study were confirmed in the previous studies as listed below. 
1: chicken anti-GFP (Invitrogen, A10262): Potdar, S. & Sheeba, V. Wakefulness Is promoted during day time by PDFR Signalling to 
dopaminergic neurons in Drosophila melanogaster. eneuro 5, ENEURO.0129-0118.2018 (2018). 
2: rabbit anti-GFP (Invitrogen, A-11122): Zhang, P. et al. Heparan sulfate organizes neuronal synapses through neurexin 
partnerships. Cell 174, 1450-1464.e1423 (2018). 
3: mouse anti-GFP (Sigma, G6539): Fushiki, A. et al. A circuit mechanism for the propagation of waves of muscle contraction in 
Drosophila. eLife 5, e13253 (2016). 
4: mouse anti-nc82 (DSHB, AB-2314866): Ding, Y. et al. Neural evolution of context-dependent fly song. Curr. Biol. 29, 
1089-1099.e1087 (2019). 
5: rabbit anti-dsRed (Clontech, 632496): Ni, J. D. et al. Differential regulation of the Drosophila sleep homeostat by circadian and 
arousal inputs. eLife 8, e40487 (2019). 
6: rabbit anti-corazonin (Crz) (Jan Veenstra): Kapan, N., Lushchak, O. V., Luo, J. & Nassel, D. R. Identified peptidergic neurons in 
the Drosophila brain regulate insulin-producing cells, stress responses and metabolism by coexpressed short neuropeptide F and 
corazonin. Cell. Mol. Life Sci. 69, 4051-4066 (2012). 
7: rabbit anti-sNPF (Jan Veenstra): Knapek, S., Kahsai, L., Winther, A. M. E., Tanimoto, H. & Nassel, D. R. Short neuropeptide F 
acts as a functional neuromodulator for olfactory memory in kenyon cells of Drosophila mushroom bodies. The Journal of 
Neuroscience 33, 5340-5345 (2013). 
8: rabbit anti-dilp2 (Ernst Hafen): Ikeya, T., Galic, M., Belawat, P., Nairz, K. & Hafen, E. Nutrient-dependent expression of insulin- 
like peptides from neuroendocrine cells in the CNS contributes to growth regulation in Drosophila. Curr. Biol. 12, 1293-1300 
(2002). 
9: mouse anti-HA (Covance, 901501): Kim, Y. et al. Methylation-dependent regulation of HIF-1a stability restricts retinal and 
tumour angiogenesis. Nature Communications 7, 10347 (2016). 
10: rabbit anti-AKH (Jae H. Park): Lee, G. & Park, J. H. Hemolymph sugar homeostasis and starvation-induced hyperactivity 
affected by genetic manipulations of the adipokinetic hormone-encoding gene in Drosophila melanogaster. Genetics 167, 
311-323 (2004). 
11: rabbit anti-AKH (Seung K. Kim): Kim, S. K. & Rulifson, E. J. Conserved mechanisms of glucose sensing and regulation by 
Drosophila corpora cardiaca cells. Nature 431, 316-320 (2004). 
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Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Male flies (Drosophila melanogaster) between 3-10 days of age were used in this study. All the flies were outcrossed using 
w1118 (BL6326) over five times before the experiments 


Wild animals We did not use wild animals in this study. Instead, we used w1118 as a control. 


Field-collected samples We did not use any field-collected samples or animals in this study. 
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Co-inhibitory immune receptors can contribute to T cell dysfunction in patients with 
cancer’. Blocking antibodies against cytotoxic T-lymphocyte-associated protein 4 
(CTLA-4) and programmed cell death 1 (PD-1) partially reverse this effect and are 
becoming standard of care in an increasing number of malignancies®. However, many 
of the other axes by which tumours become inhospitable to T cells are not fully 
understood. Here we report that V-domain immunoglobulin suppressor of T cell 
activation (VISTA) engages and suppresses T cells selectively at acidic pH suchas that 
found in tumour microenvironments. Multiple histidine residues along the rim of the 
VISTA extracellular domain mediate binding to the adhesion and co-inhibitory 
receptor P-selectin glycoprotein ligand-1 (PSGL-1). Antibodies engineered to selectively 
bind and block this interaction in acidic environments were sufficient to reverse VISTA- 
mediated immune suppression in vivo. These findings identify a mechanism by which 


VISTA may engender resistance to anti-tumour immune responses, as well as an 
unexpectedly determinative role for pH inimmune co-receptor engagement. 


VISTA (also known as B7-HS, PD-1H, Gi24, Dies1, SISP1 and DD1q) is a 
B7 family ligand that is expressed on circulating and intratumoural 
myeloid cells and weakly expressed on activated lymphocytes*”. It has 
been shown to inhibit T cell responses in vitro and in preclinical models 
of autoimmunity and cancer*™. VISTA has also been recognized as a 
potential mediator of resistance to anti-PD-1 and anti-CTLA-4 immu- 
notherapies in patients®””’. However, opportunities for therapeutic 
intervention have been limited by a lack of understanding of VISTA’s 
counter-receptor and function. Here we report that VISTA is an acidic 
pH-selective ligand for PSGL-1. 

Compared with other immunoglobulin superfamily members, the 
extracellular domain of VISTA is uniquely rich in histidine residues 
(Fig. 1a, Extended Data Fig. 1a). Because the imidazole sidechain of his- 
tidine protonates at physiologically relevant pH", we hypothesized 
that VISTA preferentially engages its counter-receptor in acidic envi- 
ronments such as tumour beds, where pH values as low as 5.85 have 
been measured”. We found that VISTA multimers bound detectably 
to leukocytes at acidic pH, but not at the physiological pH 7.4 (Fig. 1b, c, 
Extended Data Fig. 1b-d). Mouse VISTA also demonstrated pH-selective 
binding (Extended Data Fig. le). We tested a panel of monoclonal anti- 
bodies against VISTA for their ability to inhibit binding. VISTA.4 and other 


antibodies in the same epitope bin blocked VISTA binding to T cells at 
acidic pH, while VISTA.5 and other antibodies in its epitope bin did not 
(Fig. 1d, Extended Data Fig. 1f-h). VISTA has been shown to inhibit T cell 
function in a variety of contexts and without a clear requirement for 
acidic pH***"”, To address the role of pH in VISTA function, we cultured 
Tcells and Jurkat cells with VISTA-expressing cells or recombinant VISTA. 
Whereas VISTA suppressed proliferation, IFN-y production, and NF-kB 
phosphorylation at pH 7.4, its effect was much more pronounced at 
acidic pH (Fig. le, f, Extended Data Fig. 2a-e). VISTA.4, but not VISTA.5, 
restored T cell responsiveness (Fig. le, f, Extended Data Fig. 2a-e). 
These results suggested that VISTA is functionally pH-selective, and 
that antibodies that block VISTA binding to T cells at acidic pH reverse 
its suppressive activity. 

We subsequently found that VISTA.4 recognizes a histidine-rich 
epitope and is itself sensitive to pH in binding assays (Extended Data 
Fig. 2f-i). This led us to hypothesize that histidine protonation ena- 
bles antibodies to distinguish the active (acidic pH) and inactive (neu- 
tral pH) states of the VISTA ligand interface. Mutational scanning of 
VISTA.4 identified variants with a broad range of pH preferences (Fig. 1g, 
Extended Data Fig. 3a). Further engineering of rare non-pH-sensitive 
blocking antibodies such as VISTA.16 produced progeny that were 
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Fig.1| VISTAis pH-selective. a, Immunoglobulin superfamily extracellular 
domain histidine frequencies and lengths. The size of each point corresponds to 
the number of histidine residues. VISTA is highlighted in red. b, VISTA multimer 
binding to human lymphocytes at pH 6.0 and 7.4. Lymphocytes left unbound 
(FMO) or bound with non-VISTA-loaded multimers are included as controls. 
These data are representative of three independent experiments.c, VISTA 
multimer binding to activated human CD4'T cells at the indicated pH. These 
data are representative of seven independent experiments. d, VISTA multimer 
binding to activated human CD4‘ T cells at pH 6.0 and in the presence of VISTA.4 
(red), VISTA.S (blue) or acontrol antibody (black). Data are VISTA multimer 
mean fluorescence intensity (MFI) and are representative of six independent 
experiments. e, Effects of VISTA.4 (red), VISTA.5 (blue) and control antibodies 
(black) on human CD4’ T cell proliferation during co-culture with 293T-OKT3- 
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VISTA cells, which ectopically express a single chain variable fragment of the 
agonistic CD3 monoclonal antibody OKT3 and human VISTA. Data are the per 
cent of T cells that proliferated and are representative of three independent 
experiments. f, Effects of VISTA.4 (green), VISTA.5 (blue) and control antibodies 
(red) on human CD4* T cell NF-kB phosphorylation during T cell receptor and 
VISTA stimulation at the indicated pH. Non-T cell receptor-stimulated cells 
(grey) and non-VISTA-stimulated cells (black) are also included as controls. Data 
are pNF-KB MFI normalized to control+s.e.m. n=2T cell donors; these data are 
representative of two independent experiments. g, Surface plasmon resonance 
(SPR) sensorgrams of VISTA.4, a pH-independent variant of VISTA.4 and an 
acidic pH-selective variant of VISTA.4 binding to human VISTA at pH 6.0 (dashed 
traces) and pH 7.4 (solid traces). These data are representative of two 
independent experiments. 


Fig. 2| Crystal structure of VISTA and blocking 
antibody epitope. a, The structure of the human 
VISTA IgV domain (green) in complex with the VISTA.18 
Fab (heavy chain, dark grey; light chain, light grey). 

b, Asuperimposition of the VISTA (green) and PD-L1 
(purple) IgV domains. VISTA histidine residues are 
depicted in stick representation. Histidine residues 
occupying the loop between the central B-sheets of 
VISTA (H153, H154 and H155) are labelled. H100, H101 
and H104 are in disordered regions and are not 
depicted.c, The molecular surface of the VISTA.18 
epitope (yellow). d, An enlarged view of the interface 
between VISTA (green, with epitope residues depicted 
in stick representation) and VISTA.18 (depicted as an 
electrostatic surface). 
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Fig. 3 |PSGL-Lis a VISTA receptor at acidic pH. a, Results of VISTA-Fc (right) 
and anti-CD3 (left) receptor capture on human CD4’ T cells at pH 6.0. log, (fold 
enrichment) and -log adjusted Pvalues are plotted onthe xandyaxes, 
respectively. Among the captured proteins, bait components are coloured blue 
and putative counter-receptors are coloured red, green (VISTA) and purple 
(GP1BA). These data are representative of two independent experiments. b, Bio- 
layer interferometry (BLI) binding magnitudes for P-selectin and VISTA binding 
to captured PSGL-1at pH 6.0 (green) and 7.4 (blue). These dataare representative 
of tenindependent experiments. c, VISTA multimer binding to activated human 
CD4* T cells at pH 6.0 with (red) and without (blue) PSGL-1 gene deletion. 
Unbound cells (FMO, black) are included as controls. Data are VISTA multimer 
MFI+s.e.m. and areacomposite of five independent experiments. d, Left, VISTA 
multimer binding to CHO cells expressing PSGL-1at pH 6.0. Right, effects of 
VISTA.4 (red) and the PSGL-1 antibody KPL1 (blue) on binding. Data are the per 
cent reduction of VISTA-Fc MFI relative to control and are representative of eight 


highly selective for acidic pH, including the clone VISTA.18 (Extended 
Data Fig. 3b-d). 

To characterize VISTA’s structure and the determinants of blocking- 
antibody binding, we co-crystallized the VISTA immunoglobulin vari- 
able (IgV) domain with the fragment antigen-binding (Fab) domain 
of VISTA.18 (Fig. 2a, Extended Data Fig. 4a, Supplementary Table 1). 
The VISTA IgV domain is elongated by additional residues in its two 
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independent experiments. e, BLI binding magnitudes of VISTA and P-selectin 
binding to PSGL-1 produced without sialyl-Lewis X decoration (SLX ) or with low 
levels of tyrosine sulfation (sY-poor) at pH 6.0 (green) and pH 7.4 (blue). These 
data are representative of one experiment. f, BLI binding magnitudes of PSGL-1 
binding to VISTA with histidine residues at positions 153/154/155 left intact (WT) 
or replaced by alanine (HA), aspartic acid (H>D), or arginine (HR) at pH 6.0 
(green) and pH 7.4 (blue). These data are representative of three independent 
experiments. g, A computational model of PSGL-1 (grey) bound to VISTA (cyan). 
Key residues are depicted in stick representation. h, BLI magnitudes for PSGL-1 
binding to wild-type, H98R/H1OOR, H153R/H154R/H155R and H98R/H100R/ 
H153R/H154R/HI155R (quintuple) VISTA at pH 7.4. These data are representative 
of two independent experiments. i, Wild-type and H>R mutant VISTA-Fc 
binding to activated human T cells at pH 7.4. These data are representative of 
two independent experiments. WT, wild-type. 


C-terminal B-strands, and histidine residues are concentrated ona 
loop connecting the strands (Fig. 2b). VISTA.18 binds this loop, whereas 
the non-blocking antibody VISTA.5 binds an opposing region (Fig. 2c, 
Extended Data Fig. 4b, c). The VISTA.18 complementarity-determining 
region (CDR) residues E100 and D102 form hydrogen bonds with VISTA 
histidine residues H153 and H154 and mediate the antibody’s selectivity 
for acidic pH (Fig. 2d Extended Data Fig. 4d, e). 
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Fig. 4| VISTA blockade at acidic pH reverses its suppressive effects in vivo. 
a, b, MC38 tumour-bearing wild-type mice were treated with a control antibody, 
aVISTA-blocking antibody, a PD-1-blocking antibody, or both VISTA- and PD-1- 
blocking antibodies. n=10 (a) or 5 (b) mice per group; these data are 
representative of three independent experiments. a, Tumour volumes over 
time. TF denotes mice that were tumour-free at the end of the study. 

b, Frequency (left) and PD-1 MFI (right) of intratumoural CD8’ T cells. Data are 
means +s.e.m. with one-way ANOVA and Dunnett's multiple comparisons. 
*P=0.001,**P=0.007 and ***P< 0.0001. c, MC38 tumour-bearing VISTA- 
knockout (KO) mice and wild-type littermates were treated with a control 
antibody (VISTA knockout, red; wild type, black) or a PD-1 blocking antibody 
(VISTA knockout, purple; wild type, blue). n= 5-8 as indicated; these data are 
representative of two independent experiments. Data are 

medians +interquartile ranges. d, Human VISTA knock-in (KI) mice and their 
wild-type littermates were treated with VISTA.16 (VISTA knock-in, blue; wild 
type, black) or VISTA.18 (VISTA knock-in, red; wild type, grey). Data are mean 


We then performed ligand-based receptor capture with VISTA-Fc 
chimeric protein and human CD4*T cells at acidic pH’. PSGL-1 was 
one of few proteins that were enriched relative to controls (Fig. 3a). 
In addition to its well-characterized role facilitating adhesion interac- 
tions between leukocytes, platelets and endothelial cells?°”, PSGL-1 
has been identified as a negative regulator of T cell responses in 
contexts of chronic viral infection, cancer, and some autoimmune 
diseases” *®. To confirm the interaction, we conducted Octet bio- 
sensor and isothermal titration calorimetry assays with the minimal 
PSGL-1 glycopeptide that supports high affinity P-selectin binding” 
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blood antibody concentrations +s.e.m. The calculated MRT for VISTA.16 and 
VISTA.18 in VISTA knock-in mice were 4.1and 71h, respectively. n= 4 knock-in; 
n=1VISTA.16, wild-type; and n=2 VISTA.18, wild-type mice per group; dataare 
representative of one experiment. e, Human VISTA knock-in mice bearing MC38 
tumours were treated with fluorescently labelled VISTA.16 (left) or VISTA.18 
(right). Organs were imaged at 51h post-injection. These data are representative 
of two independent experiments. f, cynomolgus macaques were treated with 
VISTA.4 (red circles) or VISTA.18 (blue squares). Data are serum antibody 
concentrations. The calculated MRTs were 7.6 hand 717 h respectively. n=1 
macaque per antibody; these data are representative of one experiment. 

g, Human VISTA knock-in mice bearing MC38 tumours were treated witha 
control antibody, a PD-1 blocking antibody, the non-pH-selective antibody 
VISTA.16 and/or the acidic pH-selective antibody VISTA.18. Data are tumour 
volumes. n=10-16 per group as indicated; these data are a composite of two 
independent experiments. 


(Extended Data Fig. 5). PSGL-1 bound VISTA selectively at acidic pH 
with an affinity of 0.65-0.85 pM and a stoichiometry approaching 1:1 
(Fig. 3b, Extended Data Fig. 6a—d, Supplementary Table 2). Binding was 
blocked by recombinant P-selectin, VISTA blocking antibodies and the 
P-selectin-blocking PSGL-1 antibody KPL1 (Extended Data Fig. 6e-h). 
Gene deletion of PSGL-1 significantly reduced VISTA binding to T cells 
(Fig. 3c, Extended Data Fig. 6i), whereas ectopic expression of PSGL-1 
was sufficient to enable VISTA binding to Chinese hamster ovary cells 
at acidic pH (Fig. 3d and Extended Data Fig. 6j). Similar to P-selectin”, 
VISTA also bound modestly to heparan sulfate (Extended Data Fig. 6k). 


Glycoprotein 1b platelet subunit-c« (GPIBA) was captured by VISTA as 
well (Fig. 3a), but we could not detect VISTA binding to recombinant 
GPIBA or to platelets (Extended Data Fig. 7a, b). VISTA has recently 
been reported to bind V-set immunoglobulin domain containing 3 
(VSIG-3), a surface receptor expressed in brain, testis and some cancer 
tissues*”**. We observed moderately pH-selective binding of VISTA to 
VSIG-3 using the Octet biosensor, but could not detect specific bind- 
ing in cell-based assays, and found no competition between VSIG-3 
and PSGL-1 (Extended Data Fig. 7c—-h). VISTA has also been reported 
to engage in homotypic binding”, but we were unable to confirm this 
interaction (Extended Data Fig. 7i). 

We next characterized the specificity of PSGL-1-VISTA binding. PSGL-1 
binding to P-selectin is supported by sulfotyrosine and sialyl-Lewis X 
tetrasaccharide post-translational modifications”. VISTA binding to 
PSGL-1 was independent of sialyl-Lewis X but dependent on tyrosine 
sulfation (Fig. 3e, Extended Data Figs. 5d, 8a, b). 

Blocking antibody coverage of VISTA histidine residues H153, H154 
and H155 suggested that these residues support PSGL-1 binding (Fig. 2). 
Replacement of these histidines with negatively charged aspartic acid 
eliminated VISTA binding and function in biophysical and cell-based 
assays (Fig. 3f, Extended Data Fig. 8c-f). Replacement with positively 
charged arginine left VISTA activity at acidic pH intact, but conferred only 
weak binding at pH 7.4, indicating that these three histidines were nec- 
essary but not sufficient for binding (Fig. 3f, Extended Data Fig. 8c-f). 
To identify other relevant residues, we used the solved structures of 
PSGL-1 bound to P-selectin** and VISTA bound to VISTA.18 Fab (Fig. 2) to 
computationally model PSGL-1 docked to VISTA (Fig. 3g). In this model, 
sulfated PSGL-1 tyrosine residues Y46 and Y48 make ionic interactions 
with protonated VISTA histidine residues H153 and H154, whereas VISTA 
H98 and H100 appear to interact with PSGL-1 E56 and Y51. Substitution of 
histidines H98, H100, H153, H154 and H155 with arginine enabled VISTA 
to bind to PSGL-1and to T cells at pH 7.4 (Fig. 3h, i, Extended Data Fig. 
8g). Inthe same model, the hydroxyl group of PSGL-1 T57, which can 
be decorated with sialyl-Lewis X, points away from VISTA, consistent 
with the negligible influence of sialyl-Lewis X on VISTA-PSGL-1 bind- 
ing (Fig. 3g). These data demonstrated that PSGL-1 binding to VISTA 
is mediated by charged interactions between sulfated tyrosine and 
protonated histidine residues. 

Finally, we examined the role of VISTA in anti-tumour immune 
responses. Whereas single agent treatment with a mouse VISTA block- 
ing antibody had little effect, co-blockade of VISTA and PD-1 elicited 
tumour rejection ina majority of mice implanted with MC38 colorectal 
adenocarcinomas (Fig. 4a, Extended Data Fig. 9a). Combination therapy 
also enhanced T cell tumour infiltration and reduced intratumoural 
T cell expression of the co-inhibitory receptors PD-1, LAG-3 and TIM-3 
(Fig. 4b, Extended Data Fig. 9b, c). Intratumoural myeloid cell frequen- 
cies were largely unaffected by VISTA blockade (Extended Data Fig. 9d). 
These results were phenocopied in VISTA-knockout mice, indicating that 
blocking antibodies can reverse VISTA-mediated immune suppression 
(Fig. 4c, Extended Data Fig. 9e). 

To assess VISTA-mediated suppression within tumour microenvi- 
ronments, we treated mice expressing the human VISTA extracellular 
domain (Extended Data Fig. 9f-h) with acidic pH-selective and non- 
pH-selective human VISTA-blocking antibodies. VISTA expression on 
circulating and organ-resident myeloid cells subjects antibodies to 
extensive target-mediated drug disposition (Extended Data Fig. 9i). 
Consistent with this effect, the non-pH-selective antibody VISTA.16 
exhibited a short blood mean residence time (MRT) and localized 
primarily to leukocyte-rich organs (Fig. 4d, e, Extended Data Fig. 10a). 
By contrast, the acidic pH-selective antibody VISTA.18 accumulated 
preferentially within tumours and exhibited a much longer blood and 
serum MRT in mice and acynomolgus macaque, confirming its inabil- 
ity to engage VISTA efficiently at physiological pH in vivo (Fig. 4d-f, 
Extended Data Fig. 10a—e). VISTA.18 nevertheless matched VISTA.16 in 
therapeutic benefit in combination with anti-PD-1 (Fig. 4g, Extended 


Data Fig. 10f). These data support the hypothesis that VISTA functions 
as an acidic-pH-selective immune checkpoint, although further study 
is needed into its mechanisms of action and the relevance of PSGL-1 
in vivo. 

Typically, activation-induced expression of co-inhibitory receptors 
results in preferential restraint of maturing immune responses”. 
VISTA instead appears to utilize pH selectivity to achieve a similar 
outcome, with suppression occurring in inflamed and acidic envi- 
ronments such as tumours rather than in lymphoid organs or the 
blood. This suggests that immune responses can be regulated by 
checkpoints specific to acidic environments, and that further study of 
pH selectivity may afford new opportunities for immunotherapeutic 
drug development. 
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Methods 


No statistical methods were used to predetermine sample size. Rand- 
omization on the basis of tumour volume was performed prior to the 
initiation of treatment in mouse tumor studies. The investigators were 
not blinded to allocation during experiments and outcome assessment. 


Immunoglobulin superfamily histidine analysis and VISTA 
sequence alignments 

The amino acid sequences of the extracellular domains of immuno- 
globulin domain-containing proteins were extracted from the Uni- 
Prot and Swiss-Prot databases. The number and frequency of histidine 
residues were calculated for each protein. NCBI VISTA amino acid 
reference sequences for Homo sapiens (NP_071436.1), Pan troglodytes 
(XP_001135701.2), Macaca fascicularis (predicted, XP_015311697.1), Canis 
lupus familiaris (XP_013968352.2), Rattus norvegicus (NP_001037765.1) 
and Mus musculus (NP_083008.1) were selected for sequence alignment. 
VISTA residues are numbered inclusive of the signal peptide. 


Cell binding and blocking assays 

In human VISTA binding and blocking assays, phycoerythrin (PE)-con- 
jugated streptavidin multimers (Klickmers, Immudex) were diluted to 
32 nM in Hank’s Buffered Salt Solution (HBSS, with calcium and mag- 
nesium) adjusted with 2-(N-morpholino)ethanesulfonic acid (MES) to 
the indicated pH. The diluted dextramers were loaded with 32-900 
nM recombinant monobiotinylated human VISTA (ACRO Biosystems) 
to facilitate VISTA-dextramer capture. ‘Empty’ multimers that were 
not incubated with VISTA were used as a negative control. Human leu- 
kocytes, unstimulated peripheral blood mononuclear cells (PBMC) 
T cells, PBMC T cells stimulated for 72-96 h with anti-CD3/CD28 bead 
stimulation (Human T-activator Dynabeads, ThermoFisher), or Chinese 
Hamster Ovary (CHO) cells were labelled with VISTA multimers for 30 
min at room temperature before washing with the same HBSS + MES 
buffers. Alternatively, cells were labelled with human VISTA-Fc chimeric 
proteins, and binding was detected with anti-IgG secondary antibodies 
(Jackson ImmunoResearch) diluted into the same HBSS + MES buffers. 
Labelled cells were left unfixed or fixed with formaldehyde (FoxP3 fixa- 
tion buffer, eBioscience) and acquired ona flow cytometer. 

In VISTA antibody blocking assays, 100 nM-loaded VISTA multimers 
or VISTA-Fc chimeric proteins were pre-incubated with the indicated 
antibodies before cell binding. In recombinant protein blocking assays, 
cells were pre-incubated with the indicated recombinant proteins before 
labelling with 100 nM-loaded VISTA multimers or VISTA-Fe chimeric 
proteins. 

In PSGL-1 antibody blocking assays, cells were pre-incubated with 
KPL1 (BD Biosciences or Biolegend) or PL2 (MBL) before labelling with 
32 nM-loaded VISTA multimers or VISTA-Fc chimeric proteins. VISTA-Fce 
binding was detected by anti-IgG (Jackson ImmunoResearch) or 6xHis 
(Columbia Biosciences) antibodies. Cells were acquired by flow cytom- 
etry or homogenous time resolved fluorescence (HTRF). 

In mouse VISTA binding and blocking assays, mouse splenocytes and 
lymph node-resident cells were used directly ex vivo or first stimulated 
for 48 h with anti-CD3/CD28 bead stimulation (Mouse T-activator Dyna- 
beads, ThermoFisher). Cells were labelled with mouse VISTA-Fc chimeric 
proteins in pH 6.0 HBSS or PBS. VISTA-Fc, and binding was detected 
with anti-IgG secondary antibodies (Jackson ImmunoResearch). In anti- 
body blocking assays, mouse VISTA-Fc was pre-incubated with VISTA.10 
before cell labelling. Binding of VISTA to individual leukocyte subsets was 
determined by staining for CD4, CD8, B220 and CD11b (ThermoFisher). 
Cells were acquired ona flow cytometer. 

In VSIG-3 binding assays, CHO and HEK293 cells were engineered 
to ectopically express human VSIG-3 and VISTA respectively. VSIG-3 
expression was confirmed by flow cytometry using anti-VSIG-3 (pAb 
AF4915, R&D Systems). VISTA expression was confirmed by flow cytom- 
etry using anti-VISTA (clone 740804, R&D Systems). Cell binding assays 


were performed in PBS buffers containing 0.9 mM CaCl,, 0.05 mM MgCl, 
and 0.5% BSA that were adjusted to the indicated pH by varying the 
ratios of Na, HPO, and KH,PO,. VISTA-Fc and VSIG-3-Fc were used at 10 
pg mI". Binding was detected with anti-human IgG Fab’2-PE (Invitrogen). 


Recombinant VISTA and PSGL-1 proteins 

Histidine-tagged human, cynomolgus macaque, and mouse VISTA 
extracellular domains were produced by transient transfection of 
Expi293 cells. Proteins were affinity purified via the histidine tag and 
then size-exclusion chromatography (SEC) (Superdex200). Human and 
mouse VISTA-Fc chimeric proteins were purchased from R&D Systems 
or produced by transient transfection of Expi293 cells. VISTA-Fc pro- 
teins with H98, H100, H153, H154, and H155 residues mutated to alanine, 
aspartic acid, and/or arginine were produced by transient transfection 
of Expi293 cells. 

Human PSGL-1-Fc and P-selectin-Fc chimeric proteins were pur- 
chased from R&D Systems. PSGL-119-mer glycopeptides were produced 
as previously described”. In brief, the human PSGL-119-mer contained 
the 19 N-terminal residues of PSGL1 fused to ahumanIgGI1 Fc viaa(G)4S 
linker anda TVMV protease site (19-mer-—Fc). To enable PSGL-1 decora- 
tion with sialyl-Lewis X (a tetrasaccharide containing sialic acid), PSGL-1 
19-mer-Fc plasmid was co-expressed with and without the addition of 
plasmids encoding glucosaminyl (N-acetyl) transferase (core 2, GCNT1) 
and alpha (1,3)-fucosyltransferase-7 enzymes at an 8:1:1 ratio in Expi293 
cells. The 19-mer-Fc fusion proteins were purified from supernatant by 
MabSelectSure Protein A resin (GE Healthcare) followed by prepara- 
tive SEC (Superdex200, GE Healthcare). Where indicated, the 19-mer 
fusion proteins were further fractionated into sulfotyrosine-enriched 
and sulfotyrosine-depleted pools by separated on a Q HP (GE) column 
in Tris buffer at pH 7.5. The presence of the sialyl-Lewis X and tyrosine 
sulfation post-translational modifications were determined by HECA452 
antibody binding enzyme-linked immunosorbency assay (ELISA), human 
P-selectin binding ELISA, and mass spectrometry peptide mapping. 

Human VSIG-3-Fc chimeric proteins were purchased from R&D Systems. 


Antibody generation 

Anti-human VISTA antibodies were produced in transgenic mice express- 
ing human immunoglobulin alleles in place of murine alleles®*’. Mice 
were immunized with recombinant human VISTA. Splenocytes from 
these mice were fused with the Sp2/0 myeloma cell line, and fusions that 
were positive for human IgG g/k antibody production were screened 
for VISTA reactivity by ELISA to his-tagged human VISTA and by flow 
cytometry to HEK293 cells stably expressing cell surface human VISTA. 
Alternatively, single chain fragment-variable (scFv) antibody libraries 
were created from genetic material isolated from immunized mice and 
screened by mRNA display for binding to recombinant human VISTA at 
pH6.0as previously described’. Positively selected sequences were 
reformatted as full-length human antibodies, produced via transient 
transfection of Expi293 cells, and validated for binding to VISTA by SPR. 
Where indicated, anti-human VISTA antibodies were converted to chi- 
meric antibodies with a mouse IgG1-D265A isotype. 

For imaging studies, VISTA.16 and VISTA.18 antibodies were fluores- 
cently labelled with Alexa Fluor 680 using the SAIVI Rapid Antibody 
Labelling Kit (ThermoFisher). Monomericity of the labelled antibodies 
was confirmed by SEC. The amount of fluorophore conjugation was meas- 
ured by LC-MS, and retention of VISTA binding was confirmed by SPR. 

Anti-mouse VISTA antibodies were produced in VISTA-knockout mice 
immunized with recombinant mouse VISTA. Splenocytes from immu- 
nized mice were fused with the Sp2/0 myeloma cell line. Hybridoma 
supernatants were screened for reactivity to recombinant mouse VISTA 
by ELISA and to cell surface mouse VISTA by flow cytometry. Promising 
candidates were then subcloned, sequenced, and expressed recom- 
binantly with amouse IlgG1-D265A isotype*®. The clone VISTA.10, which 
bound mouse VISTA independent of pH and efficiently blocked its bind- 
ing to T cells, was selected for use in mouse studies. 
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Antibody epitope binning and pH sensitivity 

Competitive SPR epitope binning was used to identify VISTA-specific 
antibodies which cross-block the desired VISTA.4 or undesired VISTA.5 
epitopes using a Biacore T200 instrument. Antibodies VISTA.4 and 
VISTA.5 were diluted to 10 pg ml in 10 mM sodium acetate pH 4.5 and 
immobilized onto the flow cells of aCMS5 biosensor following the manu- 
facturer’s amine coupling protocol (GE Healthcare). Competition was 
assessed at 25 °C using HBS-P+ running buffer (10 mM HEPES, 150 mM 
NaCl and 0.05% v/v Surfactant P20, pH 7.4). 100 nM monovalent human 
VISTA-ECD-His was captured by the immobilized antibodies, then each 
VISTA antibody screened was injected at 100 nM to evaluate VISTA.4 or 
VISTA.5-mediated co-binding or blocking activity. Two 30 s injections 
of 10 mM glycine pH 2.0 regenerated the VISTA.4 and VISTA.5 surfaces 
between assay cycles. Sensorgrams were analysed using Biacore T200 
Evaluation Software v.2.0. Antibodies blocked by VISTA.4 and antibodies 
that were not blocked by VISTA.4 or VISTA.5 (that is, different epitope), 
were assessed for pH-dependent VISTA binding at pH 7.4 and 6.0. 


Human T cell functional assays 

In co-culture experiments, CD4* T cells were enriched from healthy 
donor blood by negative selection (StemCell RosetteSep) and labelled 
with the proliferation dye CellTrace Violet (ThermoFisher). 293T cells 
were engineered to ectopically express a single chain variable frag- 
ment of the agonistic CD3 monoclonal antibody OKT3 and human VISTA 
(293T-OKT3-VISTA). CD4° T cells and irradiated 293T-OKT3-VISTA cells 
were co-cultured at a ratio of 4:1in RPMI-164.0 supplemented with 10% 
v/v heat-inactivated fetal calf serum (FCS), 2 mML-glutamine (Gibco), 2 
mM non-essential amino acids (Gibco), 1mM sodium pyruvate (Gibco), 
55 UM B-mercaptoethanol and titrated human anti-human VISTA or 
isotype-matched non-VISTA-binding control antibodies for 5 days. Pro- 
liferation was calculated as the percentage of CD4* T cells undergoing 
CellTrace Violet dye dilution, as determined by flow cytometry. 

In NF-kB phosphorylation experiments, tissue culture-treated 96-well 
flat bottom plates were coated with OKT3 (0.5 pg mI”) and either human 
VISTA-Fc or isotype-matched control antibody (5.0 pg mI) in PBS at 
37 °C for approximately 2 h. Where indicated, VISTA-Fc proteins with 
H153, H154 and H155 residues mutated to alanine, aspartic acid, or 
arginine were used in place of wild-type VISTA-Fc. The wells were then 
washed with PBS and pre-incubated with human anti-human VISTA or 
non-VISTA-binding control antibodies diluted to 5.0 pg ml in HBSS 
acidified to various pH with MES for 30 min. T cells were suspended 
in the same HBSS + MES buffers, added to the wells, centrifuged, and 
cultured at 37 °C for 15 min. The cells were then fixed (Cytofix buffer, 
BD Biosciences), permeabilized (Phosflow Permeabilization Buffer III, 
BD Biosciences), and stained with anti-pNF-kB S529 (BD Biosciences) 
and anti-mlgG secondary antibodies (Jackson Immunoresearch) before 
being acquired ona flowcytometer. NF-kB phosphorylation was calcu- 
lated as a percentage of the pNF-kB MFI for CD4+ T cells stimulated at 
pH 7.4 in wells that had been coated with OKT3 and isotype-matched 
control and pre-incubated with soluble non-binding isotype-matched 
control antibodies. 

In Jurkat assays, Jurkat cells were engineered to express luciferase 
under the control of an Nf-KB-inducible promoter. These Jurkat NF-kB- 
luciferase cells were co-cultured with non-irradiated 293T-OKT3-VISTA 
cells at a ratio of 4:1in HBSS (ThermoFisher) acidified to various pH with 
MES and human anti-human VISTA antibodies for 4h. Jurkat cell activa- 
tion was measured by luciferase substrate assay (Promega). 


Antibody epitope mapping 

VISTA antibody binding epitopes were mapped by yeast display and NGS 
as previously described“. In brief, a saturation mutagenesis library 
of single point mutants of VISTA’s extracellular domain was generated 
and displayed on the surface of yeast. VISTA mutants that lost binding 
to the test antibody, but retained binding to anon-cross-blocking VISTA 


antibody, were sorted and sequenced. The positions of the mutations 
in these mutants were designated as energetically important residues 
in the test antibody’s epitope. 


SPR 

VISTA antibody binding to recombinant human VISTA-ECD-His protein 
was measured at acidic and neutral pH (Biacore T200, GE Healthcare). 
Protein A (ThermoFisher catalogue no. 21181) was diluted to 20 pg mI 
in1OmM sodium acetate pH 4.5 and immobilized onto the flow cells 
of aCMS biosensor following the manufacturer’s amine coupling pro- 
tocol (GE Healthcare). All SPR experiments were conducted at 37 °C 
using PBST (137 mM sodium chloride, 2.7 mM potassium chloride, 10 
mM phosphate buffer and 0.05% Tween 20) running buffer at the indi- 
cated pH. Antibodies were diluted to 20 nM in PBST pH 7.4, and were 
captured on the protein A surface. A concentration series of 100-0.8 
nM monovalent human VISTA-ECD-His was injected over the captured 
antibodies at 40 pl min “to measure association and dissociation. Two 
15s injections of 10 mM glycine pH1.5 regenerated the Protein A capture 
surface between assay cycles. Rate constants k, (k,,, association rate) 
and Kg (Kor, disassociation rate) were derived from reference flow cell 
and 0 nM blank-subtracted sensorgrams, and were fit to a 1:1 binding 
model in Biacore T200 Evaluation Software v.3.1. 

VISTA.18 Fab was prepared by papain digest following manufacturer’s 
protocol (ThermoFisher catalogue no. 44985). Binding of VISTA.18 Fab 
was performed using the same running buffer (PBST) at pH 7.4 and 6.0. 
The previously described VISTA mutant Fc-fusion proteins were diluted 
to 25 nM and captured for 60 s at 10 pl min™ using an anti-human Fc 
CMS sensor chip (GE Healthcare), following which VISTA.18 Fab was 
injected at concentration ranging from 0.8-100 nM. The surface 
was regenerated using a 30s pulse of 3M MgCl, at 30 pl min. %Rinax 
(normalized SPR binding response) for Fab binding was calculated as 
(observed R,,,,/calculated R,,,,,) x 100, where observed R,,,., = RU (binding 
response units) response at the end of the association phase; calculated 
Rymax = (Capture level/molecular mass of ligand) x molecular mass of 
analyte x valency. 


Antibody engineering 

Antibody variant libraries were built by introducing aspartate, glutamate 
and histidine substitutions inthe CDRs of the heavy and light chain vari- 
able regions. The CDR variants were synthesized as oligo pools (Twist 
Biosciences), allowing for single and double amino acid substitutions 
ineach CDR. Each library was constructed to allow for a maximum of six 
amino acid substitutions per chain. These libraries were expressed on 
the surface of yeast and subjected to several rounds of binding to recom- 
binant human VISTA at pH 6.0. We then performed additional rounds of 
selection toggling between positive selection for VISTA binding at pH 
6.0 and negative selection for VISTA binding at pH 7.4. Selected variants 
were reformatted as full-length human antibodies and validated by SPR. 


VISTA crystallography 

N91Q, N108Q and N190Q mutations were introduced into VISTA’s extra- 
cellular domain to reduce glycosylation. Recombinant VISTA.18 fragment 
antigen-binding (Fab) and low-glycosylation VISTA were labelled with 
AlexaFluor 488 and 555, respectively. The labelled proteins were co- 
incubated at a 4.8:1 molar ratio of VISTA:Fab in DPBS overnight at 4 °C. 
The resulting VISTA-Fab complex was purified by SEC (Superdex 200 
16/200 gel filtration column, GE Healthcare). Fractions corresponding to 
the VISTA:Fab complex were pooled and concentrated to approximately 
15 mg mI". The complex was crystallized by combining 0.5 pl of the 
complex with 0.5 pl of the precipitant (1.8 M ammonium sulfate, 0.1M 
phosphate/citrate, pH 4.2), over areservoir containing 75 pl of additional 
precipitant, inan MRC UVXPO sitting drop vapour diffusion crystalliza- 
tion tray (Swissci) at room temperature. Initial crystals developed within 
2to3 days, and grewto their full size of approximately 100 LM x 100 uM x 
400 uM over approximately 10 days. Crystal fluorescence was visualized 


using a RockImager 1000 crystal imaging system (Formulatrix). The 
crystals were cryoprotected by subermersion in 3.4 Mammonium sul- 
fate and then flash frozen in liquid nitrogen for X-ray data collection. 
X-ray diffraction data was collected at a wavelength of 1.0 Aandatem- 
perature of 100 K at the Advanced Photon Source (IMCA-CAT beamline 
17-ID). Data reduction was performed using HKL2000 (HKL Research). 
The VISTA:VISTA.18 Fab co-crystal structure was solved by molecular 
replacement (MR) using Phaser“ and the coordinates from the VHVL 
and CHICL portions of an internally determined Fab crystal structure 
as input models. The initial MR model provided enough phasing power 
to enable confident building of VISTA from scratch. The structure was 
completed through iterative cycles of model building using Coot* and 
restrained refinement using autoBUSTER (Global Phasing). Asummary 
of the data collection and refinement statistics is provided in Supple- 
mentary Table 1. The VISTA:VISTA 18 Fab co-crystal structure has been 
deposited into the RCSB PDB under accession number 6MVL. 


Receptor-based ligand capture and mass spectrometry 

Receptor-based ligand capture and mass spectrometry was performed 
using TriCEPS (Dualsystems Biotech) as previously described”, with 
the exception that the labelling buffer was acidified to pH 6.0. In brief, 
capture was performed on human CD4* T cells with VISTA-Fc chimeric 
protein bait. The anti-human CD3 antibody OKT3 was used asa ccontrol 
bait. For labelling, 300 pg each of OKT3 and human VISTA-Fc chimeric 
protein were buffer exchanged to 150 il of 25 mM HEPES at pH 8.2. Tri- 
CEPS v. 3.03 (150 1g) was added to each reaction, mixed and incubated 
at room temperature with gentle shaking for 90 min. In parallel, 600 
million human CD4' T cells were enriched from healthy donor blood by 
RosetteSep (StemCell) and suspended in PBS and 1% v/v heat-inactivated 
fetal calf serum at pH 6.0 (labelling buffer). The cells were then oxidized 
by treatment with 1.5 mM sodium metaperiodate at 4 °C for 15 min. After 
oxidation, the cells were washed, divided into two parts, and incubated 
with either the VISTA bait or the OKT3 bait at 4 °C with gentle shaking for 
90 min. After labelling, the cells were washed, pelleted and snap frozen. 


VISTA, PSGL-1, P-selectin, KPL1, GP1BA, and VSIG-3 octet 

Binding interactions were detected using an OctetRed384 BLI instru- 
ment (PALL/ForteBio). Allassay steps were conducted at 30 °C at 1,000 
r.p.m. shake speed. Unless otherwise noted, the pH 6.0 buffer contained 
50mM MES, 200 mM sodium chloride, 4 mM calcium chloride and 0.05% 
v/v Tween 20. The pH 7.4 buffer contained 10 mM HEPES, 150 mM sodium 
chloride, 4mM calcium chloride and 0.05% v/v Tween 20. These buffers 
were used for the full duration of the assays. 

For binding experiments measuring VISTA, KPL1, and P-selectin bind- 
ing to PSGL-1, the human PSGL-119-mer-Fc fusion protein (see recombi- 
nant proteins methods) was first captured on anti-human IgG-Fc sensors 
(AHC, PALL/ForteBio). Where specified, the human PSGL-1 19-mer-Fc 
fusion protein used was produced in cells not transfected with a1,3- 
fucosyltransferase and core 2 B1,6-N-acetylglucosaminyltransferase, 
or were separated by anion exchange liquid chromatography into 
sulfotyrosine-rich and sulfotyrosine-poor fractions. The anti-human 
capture sensors were then blocked with total human IgG (Jackson Immu- 
noresearch). Binding to 500 nM wild-type human VISTA-Fc fusion pro- 
tein (R&D Systems), 50 nM human P-selectin-Fc fusion protein (R&D 
Systems) and 200 nM KPL1 (R&D Systems) was measured for 10 min. 

For antibody and VSIG-3 competition experiments, human VISTA-Fc 
was diluted to 400 nM and premixed for 30 min with O nM, 200 nM, 
400 nM or 800 nM of each test antibody or human VSIG-3-Fc fusion 
protein (R&D Systems) before assessing binding. For KPL1 and human 
P-selectin competition experiments, captured human PSGL-119-mer-Fc 
fusion protein was blocked using 400 nM negative control antibody, 
KPL1 (Millipore Sigma) or 400 nM human P-selectin (R&D Systems), 
then dipped into titrated human VISTA-Fc. 

For BLI binding experiments evaluating multi-pH interactions of 
VISTA to VSIG-3, CD42b/GP1Ba, PSGL-1 and VISTA, all assay steps were 


performed in DPBS buffer (Gibco) containing 0.05% v/v Tween 20, 
pH-adjusted to 5.8, 6.2, 6.6, 7.0 or 7.4 as indicated. The experimental 
conditions described above were applied, with the exception that 200 
nM human VISTA-Fc was first captured to AHC sensors, and binding 
was instead measured to 500 nM wild-type human PSGL-119-mer-Fc 
fusion protein, 500 nM Y>A mutant human PSGL-119-mer-Fc fusion 
protein, 500 nM human VSIG-3-Fc fusion protein (R&D Systems), 500 
nM human CD42b/GP1Ba (R&D Systems) or 500 nM human VISTA-Fc 
fusion protein (R&D Systems). 

For BLI experiments measuring PSGL-1 binding to VISTA histidine 
mutants, VISTA H153/H154/H155 mutants with a human Fc tag at 500 
nM were captured on anti-human IgG Fc sensor (Pall/Forte Bio) for 10 
min, blocked with total human IgG (Jackson Immunoresearch) and 
dipped into human PSGL-119-mer-Fc (125, 250 and 500 nM) to measure 
association for 10 min. 

Double reference-subtracted sensorgrams were analysed in Data 
Analysis 9.0 (PALL/ForteBio), and binding responses at the end of the 
association phase are reported. 


PSGL-1 glycopeptide ELISA 

Purified human PSGL-119mer-Fc proteins and control human Fc (Jack- 
son ImmunoResearch) were adsorbed to ELISA plates (Nunc Maxisorb) 
at 10 pg mI in DPBS overnight at 4 °C. Plates were blocked with 1% 
BSA-0.05% Tween 20 (Teknova) for 1 hand dilutions of the anti-sLewisX 
monoclonal antibody HECA452 (Santa Cruz Biotechnology) made in 
the same buffer across the plate. Plates were incubated for 4 hat room 
temperature, washed and incubated for 1h with anti-rat IgM-HRP (Jack- 
son ImmunoResearch), followed by washing and detection with TMB 
substrate (Thermo). Plates were read in a Spectramax plus instrument 
using Softmaxpro (Molecular Devices). 

For P-selectin ELISA assays, purified human PSGL-119mer-Fc proteins 
and control human Fc (Jackson ImmunoResearch) were adsorbed to 
ELISA plates (Nunc Maxisorb) at 1 pg mIin DPBS overnight at 4 °C. Plates 
were blocked with 1% BSA-0.05% Tween 20 (Teknova) with added 0.5 
mM MgCland1mM CaCl for 1h. All subsequent additions were made in 
this same buffer. Dilutions of human P-selectin—Fc (R&D systems) were 
incubated 1h at room temperature. After washing, biotinylated goat 
anti-Hu P-selectin (R&D systems) was added at 1:4,000 for 1h at room 
temperature. This was followed by washing and adding streptavidin-HRP 
(Thermo) at 1:4000 for 1h, then washing again and detection with TMB 
substrate (Thermo). Plates were read in a Spectramax plus instrument 
using Softmaxpro (Molecular Devices). 


Mass spectrometry 

PSGL-119-mer glycopeptide-Fc-fusion proteins were denatured inthe 
presence of 0.5% Rapigest surfactant, reduced, alkylated, and digested 
by pepsin or trypsin and Glu-C. Data were acquired on QEPlus mass 
spectrometer (ThermoFisher) connected to Aquity UPLC (Waters) and 
analysed using Byonic software (Protein Metrics). Results were manu- 
ally verified. 


Isothermal titration calorimetry 

VISTA-Fc, VISTA-His and PSGL-119-mer-Fc were dialysed against phos- 
phate buffered saline (PBS) at pH 6.0 or pH 7.4 over 16 hat 4 °C. The 
concentrations of the proteins were determined by UV absorbance 
using Lunatic instrument (Unchained Labs). The concentrations used 
for the ITC experiments are listed in Supplementary Table 2. The ITC 
experiment with VISTA-Fc titration at 352 1M into 45 uM PSGL1-Fc at 
pH 7.4 was performed using a MicroCal PEAQ-ITC (Malvern Panalyti- 
cal). This experiment involved a single injection of 0.4 ul followed by 
18 injections of 2 pl with an injection duration of 4s and a150 s spac- 
ing between the injections. The reference power was set to 10 jical s“, 
and the stirring speed was 750 r.p.m. Data processing was performed 
using MicroCal PEAQ-ITC analysis software v.1.21. The rest of the ITC 
experiments were performed by a MicroCal Auto-iTC200 (Malvern 
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Panalytical). These experiments started with a single injection of 0.5 pl 
followed by 19 injections of 2 pl with an injection duration of 4s and an 
injection spacing of 180 s. The reference power and the stirring speed 
were 8 cal sand 1,000r.p.m., respectively. Data were processed using 
MicroCal Origin analysis software v.7.0 using the ‘one set of sites’ model. 
The fitted association constant (K,) values were used to determine the 
dissociation constants (K,) using the equation Kp =1/K,. The free energy 
of binding (AG) was calculated using the relationship AG = —RT In K, 
with Rand T being the gas constant and the temperature, respectively. 


Tcell CRISPR 

Human CD4° T cells were enriched by RosetteSep (Miltenyi) from whole 
blood and stimulated with plate-coated antibodies against CD3 and 
CD28 (OKT3 and CD28.2 respectively, ThermoFisher). After 2 days, the 
cells were transfected in triplicate with Cas9 ribonuclear proteins (RNPs, 
Dharmacon) loaded with PSGL-1 guide RNAs (gRNAs) CACCAGCGC- 
CAAGATTAGGA and CACTCAAAACCACAGCCATGG. gRNASs targeting 
CD4 and GFP (with no human homology) were used as controls. The 
transfected cells were then further stimulated with CD3/CD28 beads 
(Human T-activator Dynabeads, Miltenyi). After 4 days, VISTA multimer 
binding was assessed with gating on PSGL-1-knockout and PSGL-I' cells. 


PSGL-1-VISTA docking 

The crystal structure of the human PSGL-119-mer peptide bound to 
human P-selectin contains coordinates for 13 of the 19 residues (QATEY- 
EYLDYDFLPETEPP, residues with coordinates in bold). The structure 
was modelled in Maestro v.10.7.015** by addition of the remaining six 
residues and energy minimization using the OPLS force field for peptides 
as implemented in the ‘protein preparation’ feature. The IgV domain 
of human VISTA was taken from the crystal structure reported here of 
anti-VISTA Fab complexed with VISTA. Missing side chains and loops 
were filled in using protein preparation in Maestro. The anti-VISTA Fab 
binding site was used for docking the PSGL-119-mer peptide to VISTA 
using the suggested protocol for protein-protein docking described 
in the BioLuminate module in Maestro. 


Wild-type and VISTA-knockout mice 

Wild-type C57BL6/J mice were obtained fromJackson Laboratories and 
Charles River Laboratories. VISTA-knockout mice were obtained from 
the University of California Davis Knockout Mouse Project (KOMP). 


Human VISTA knock-in mice 

A Vista-targeting vector was constructed from genomic C57BL/6N 
mouse strain DNA (genOway). A human VISTA sequence coding for the 
mature protein was inserted in frame with mouse exon 3 downstream of 
the mouse signal peptide, replacing the coding portion of exon3 and part 
of intron 3. This insertion eliminates expression of the mouse Vista gene. 
AloxP-flanked neomycin resistance cassette was inserted in intron 2. 

Linearized targeting vector was transfected with CS57BL/6N embryonic 
stem cells. G-418 resistant embryonic stem cell clones were screened 
for locus recombination by PCR and Southern blot. The integrity of the 
humanized cassette was confirmed by sequencing. 

Recombined embryonic stem cells were microinjected into CS7BL/6N 
blastocysts, giving rise to male chimaeras. These mice were crossed with 
C57BL/6N mice expressing Cre recombinase to produce heterozygous 
VISTA-humanized mice devoid of the neomycin resistance cassette. The 
presence of the wild-type mouse allele (6.1 kb) and the humanized allele 
(7.1kb) was confirmed by Southern blot using external probes of Pcil 
digested DNA. Heterozygous animals were then interbred to produced 
homozygous VISTA-humanized mice. 


Fluorescence optical imaging 

Mouse colorectal carcinoma MC38 cells were cultured in Dulbecco's 
Modified Eagle Media (DMEM) supplemented with 10% v/v heat-inacti- 
vated FCS in vitro. Six- to thirty-week-old female human VISTA knock-in 


mice were subcutaneously injected with 0.5-1.0 x 10° MC38 cells each 
while being fed an alfalfa-free diet (Teklad catalogue no. TD.97184) to 
minimize autofluorescence in the gastrointestinal tract. Tumour growth 
was monitored by caliper. When tumours reached an average volume 
of 70-110 mm’, typically 7-10 days after implantation, mice were ran- 
domized into groups on the basis of tumour volume. Mice received 
a single intravenous injection of 3 mg kg™ Alexa Fluor 680-labelled 
VISTA.16 or VISTA.18. Subsets of mice were euthanized at 2.5, 24 and 51 
h post-injection for imaging of heart, liver, spleen, kidney, lung, stom- 
ach, intestine, muscle, tumour and whole blood. Images were acquired 
under the following parameters: excitation = 679 nm, emission = 702 
nm, exposure = auto-setting, binning = 8, F/stop = 2, FOV=22.3 cm and 
focus =1.5 cm. Fluorescence intensities were quantified in units of radi- 
ant efficiency using Living Image software (v.4.5, PerkinElmer). The 
average intensities of the tissues of interest were normalized to the 
average number of fluorophores conjugated to VISTA.16 (fluorophore 
to antibody ratio, 0.79) and VISTA.18 (fluorophore to antibody ratio, 
1.31) and set to the same colour scale before defining and representing 
regions of interest. 


Tumour studies 

Mouse colorectal carcinoma MC38 cells were cultured as described 
above. Six to twelve-week-old female C57BL6/) mice, VISTA-knockout 
mice and their wild-type littermates, and human VISTA-knock-in mice 
were subcutaneously injected with 0.5-1.0 x 10° MC38 cells per mouse. 
Tumour growth was monitored by caliper. When tumours reached an 
average volume of 70-110 mm’, typically 7-10 days after implantation, 
mice were randomized into groups on the basis of tumour volume. Mice 
received 100 1g anti-PD-1, 600 pg anti-VISTA (mouse- or human-reactive 
as indicated), and/or 600 ppg of acontrol anti-diptheria toxin IgG1-D265A 
antibody administered four times by intraperitoneal injection every 3-4 
days. All antibodies were IgG1-D265A (Fc-inert) isotype. Researchers 
were not blinded to treatment assignment. Animals were continuously 
monitored, and mice were euthanized via asphyxiation when any of the 
following endpoints were met: study termination, tumour burden equal 
or greater than 2,000 mm’, tumour ulceration, body weight loss equal or 
greater than 20%, or moribund appearance. Mice whose tumours were 
unmeasurable or below 10 mm’ were considered to be tumour-free. 


Ex vivo mouse leukocyte profiling 

Where indicated, the tumours, spleens, and/or blood of mice were pro- 
filed ex vivo 7 days after the start of treatment. Tumours were enzy- 
matically disassociated with 250 U mI collagenase IV (ThermoFisher) 
and 100 mg ml™ DNase | (Roche) in HBSS supplemented with 5% v/v 
heat-inactivated FCS and 5 mM CaCl, and mechanically disassociated 
with GentleMacs cell disruptors (Miltenyi). Spleens were mechanically 
disassociated only. Spleen and blood samples were treated with ACK 
red blood cell lysis buffer (ThermoFisher). Cells were stained with the 
antibodies CD3, CD4, CD8, CD11b, CD45, F4/80, FoxP3, Ly6C, Gr1, MHC- 
II, mouse VISTA, human VISTA, PD-1, LAG-3 and TIM-3. Blocking was 
performed with anti-CD16/32 (2.4G2, BD Biosciences). Intracellular 
staining was performed with the FoxP3 Fixation/Permeabilization buffer 
kit (ThermoFisher). 


Antibody pharmacokinetics 
In mice, VISTA.16 and VISTA.18 were evaluated following intravenous 
injections into 6-12 week old female C57BL6/] mice as well as human VISTA 
knock-in mice and their wild-type littermates at a dose of 5mg kg! (n=1-4 
mice per antibody and genotype as indicated). Serial blood samples were 
collected at 0.25, 1, 6, 24, 48, 96, 168, 264, 336 and 504 h post-injection. 
In cynomolgus macaques, VISTA.4 and VISTA.18 were evaluated fol- 
lowing 10 min intravenous infusions into protein-naive 5—6 year old 
male cynomolgus monkeys at a dose of 5 mg kg (n =1 per antibody). 
Serial blood samples were collected at 0.17, 0.5, 2, 4, 6, 24, 48, 72, 168, 
216, 240 and 336 h post-infusion. 


Subsequently, serum samples were obtained for antibody concentra- 
tion analysis using a ligand-binding assay that employed the recom- 
binant VISTA as a capturing agent and an anti-human IgG Fc mAb asa 
detecting agent. The lower limit of quantification for the assay was 1 
ng mI“. Mean residence times were estimated by non-compartment 
analysis of the serum mAb concentration-time data using Kinetica soft- 
ware (v.5.0, Thermo Fisher Scientific). 


Statistics 
Except where indicated, statistics depict means, standard errors of 
the mean, and one-way ANOVA with Dunnett’s multiple comparisons. 


Receptor-based ligand capture statistics were performed as previously 
described”. 


Research ethics 

Human blood was obtained from a research blood donation program 
administered by the Bristol-Myers Squibb Occupational Health and 
Wellness department. The program was operated in compliance with all 
relevant ethical regulations, and written informed consent was obtained 
from all donors. 

Animal studies were conducted in compliance with all relevant ethical 
regulations. Animal studies performed at Bristol-Myers Squibb were 
approved by the Bristol-Myers Squibb Institutional Care and Animal Use 
Committee. Animal studies performed at Five Prime Therapeutics were 
approved by the Five Prime Therapeutics Institutional Care and Animal 
Use Committee. Animal studies performed at Murigenics were approved 
by the Bristol-Myers Squibb Animal Welfare Risk Assessment Team and 
by the Murigenics Institutional Care and Animal Use Committee. 


Materials availability 


All unique biological materials are available subject to a material transfer 
agreement. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

All data are available from the corresponding author and have been 
included in the manuscript or Supplementary Information. The 
VISTA:VISTA.18 Fab co-crystal structure has been deposited into the 
Protein Data Bank under accession number 6MVL. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Conservation of VISTA pH selectivity. a, Alignment of 
human VISTA extracellular domain amino acid residues 94-165 with 
chimpanzee, cynomolgus macaque, dog, rat and mouse equivalents. Histidine 
residues are highlighted in red. b, Human monocytes (left) and neutrophils 
(right) labelled with VISTA multimers at pH 6.0 and pH 7.4. Cells labelled with 


non-VISTA-loaded multimers (control) or left unstained (FMO) are also depicted. 


These data are representative of two independent experiments. c, VISTA 
multimer (blue) and non-VISTA-loaded multimer (control, black) binding to the 
activated human T cells depicted in Fig. 1c. Data are VISTA multimer MFl and are 
representative of six independent experiments. d, Human VISTA-Fc binding to 
human PBMC NK cells at pH 6.0 (blue), T cells at pH 6.0 (red) and T cells at pH 7.0 
(black). T cells stained at pH 6.0 with the anti-human Fc secondary but not with 
VISTA-Fc are included as a control (grey filled). These data are representative of 
ten independent experiments. e, Mouse VISTA-Fc binding at pH 6.0 to wild-type 


mouse splenic CD8' T cells (red), CD4* T cells (orange), B cells (green) and CD11b* 
myeloid cells (blue). VISTA-Fc binding at pH 7.0 (black) and isotype-matched 
human IgG binding at pH 6.0 (grey filled) are included as controls. These data are 
representative of five independent experiments. f, Competitive SPR epitope 
binning of VISTA-specific antibodies against VISTA.4 and VISTA.5. Each row 
represents a unique clone, and for each clone, green indicates no cross-blocking, 
and red indicates cross-blocking. These data are representative of one 
experiment. g, h, Antibody blocking of VISTA-multimer binding to T cells as 
described in Fig. 1. Data are VISTA multimer MFI normalized to control and are 
representative of one experiment. g, Blocking activity by antibodies from the 
VISTA.4 epitope bin. VISTA.4 itself is depicted as black squares. h, Blocking 
activity by antibodies from the VISTA.5 epitope bin. VISTA.5 itself is depicted as 
black downwards triangles. 
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Extended Data Fig. 2| Effects of pH on VISTA function and antibody binding. 
a, Representative histograms of CellTrace Violet dilution (left) and supernatant 
IFN-y (right) by CD4* T cells co-cultured with 293T-OKT3-VISTA cells in the 
presence in VISTA.4 (red), VISTA.5 (blue), anon-VISTA-binding isotype-matched 
antibody (control, black), or without 293T-OKT3-VISTA cells (grey filled). Data 
are mean +s.e.m. with one-way ANOVA and Dunnett's multiple comparisons. 
*P=0.0498.n=3T cell donors; these data are representative of seven 
independent experiments. b, Per cent of CD4' T cells that proliferated following 
co-culture with 293T-OKT3-VISTA or 293T-OKT3 cells and VISTA.4 or an isotype- 
matched non-VISTA-binding control antibody. These data are representative of 
two independent experiments. c, NFkB phosphorylation in human CD4" T cells 
following stimulation with plate-coated OKT3 and VISTA-Fc in the presence of 
the antibodies VISTA.4 (green upward triangles), VISTA.5 (blue downward 
triangles) and anon-VISTA-binding control (antibody control, red squares) at 
various pH. Cells stimulated with OKT3 anda plate-coated control antibody 
(VISTA control, black circles) and without OKT3 (grey diamond) are included as 
controls. Dataare mean +s.e.m. pNF-kB MFI normalized to control.n=2T cell 
donors; these data are representative of two independent experiments. 

d,e, Jurkat NFxB-luciferase reporter cells were co-cultured with 293T-OKT3 or 
293T-OKT3-VISTA cells and with VISTA.4 or non-VISTA-binding isotype-matched 
control antibody. These data area composite of three independent 
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experiments. d, Luciferase signal after culture with 293T-OKT3 cells (blue 
circles) or without 293T cells (black triangles). e, Per cent increase inthe 
luciferase signal with VISTA.4 treatment during culture with 293T-OKT3-VISTA 
cells. f, VISTA.4 (red) and VISTA.5 (blue) binding epitopes on the human VISTA 
extracellular domain. g, Human VISTA SPR binding sensorgrams for the blocking 
antibody VISTA.4 (left; pH 6.0, light red; pH 6.7, red; pH 7.4, dark red) and the 
non-blocking antibody VISTA.5 (right; pH 6.0, light blue; pH 6.7, blue; pH 7.4, 
dark blue). Overlaid sensorgrams are 100 nM VISTA binding responses, 
normalized to the binding report point. These data are representative of six 
independent experiments. h, Cell binding of VISTA.4 (pH 6.0, orange downward 
triangles; pH 7.0, red squares), VISTA.5 (pH 6.0, green diamonds; pH 7.0, blue 
circles), and anon-VISTA-binding antibody (pH 6.0, unfilled circles; pH 7.0, 
unfilled upward triangles) to Raji cells ectopically expressing VISTA. Data are 
VISTA antibody MFland are representative of five independent experiments. 

i, VISTA antibody epitope binning against VISTA.4 (centre row) and VISTA.5 
(bottom row). Each row represents a unique clone, and for each clone, green 
indicates a lack of cross-blocking and red indicates cross-blocking. Binding 
capacity at pH 6.0 relative to binding capacity at pH 7.4 is also depicted (top 
row). For binding at acidic pH, red indicates a greater than threefold impairment 
in k,at pH 6.0, green indicates a less than threefold impairment, and white 
indicates no data. These data are representative of one experiment. 
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Extended Data Fig. 3 | Acidic pH-selective antibody engineering. 

a, Schematic depicting the method by which the VISTA.4 antibody was 
engineered to identify variants with improved binding at acidic pH. b, Schematic 
depicting the libraries of VISTA antibody variants used for screening acidic pH- 
selective variants. c, Schematic depicting the iterative screening strategy for 
identification of acidic pH-selective VISTA antibody variants. d, Cell binding of 
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the acidic pH-selective antibody VISTA.18 to Raji cells ectopically expressing 
VISTA at pH 6.0 (red circles), pH 6.4 (orange squares), pH 6.6 (green diamonds), 
pH 7.0 (blue upward triangles), pH 7.2 (purple downward triangles) and pH 8.1 
(black hexagons). Data are VISTA antibody MFl and are representative of three 
independent experiments. 
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Extended Data Fig. 4 | Co-crystallization of VISTA and VISTA.18. The VISTA 
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IgV domain (labelled with Alexa Fluor 555) and the VISTA.18 Fab (labelled with 
Alexa Fluor 488) were co-crystallized as described in Fig. 4 and the Methods. 


a, Representative bright-field (left), Alexa Fluor 488 fluorescence (centre), and 
Alexa Fluor 555 fluorescence (right) images of the crystals, indicating the 


presence of both VISTA and VISTA.18. These data are representative of one 


experiment. b, c, Superimpositions of the molecular surfaces of the yeast 


display-defined epitopes for VISTA.18 (purple, b) and VISTA.5 (non-blocking, 


orange, c) on the VISTA IgV domain (green). d, 2m|F,— DF,| electron density map 
(blue mesh) contoured to lo about the VISTA histidine triad (green sticks) and 
VISTA.18 HCDR3 (grey sticks). e, Human VISTA SPR binding data for the acidic 
pH-selective antibody VISTA.18 and variants of VISTA.18 in which the indicated 
residues have been reverted back to their identity in VISTA.18’s non-acidic-pH- 
selective parent, VISTA.16. VISTA.16 and anon-VISTA binding isotype-matched 
control antibody are included as controls. These data are representative of one 


experiment. 
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Extended Data Fig. 5 | PSGL-1 glycopeptide characterization. a—c, ELISA moiety antibody HECA452.c, Binding to recombinant human P-selectin-Fc. 
binding curves of human PSGL-119-mer-Fc proteins produced with (blue lines) d, Extracted ion chromatograms of the peptide YLDY in PSGL-119-mer-Fc 

or without (red lines) FUT7 and Core2 co-transfection. Binding curves for proteins produced with or without FUT7 and Core2 co-transfection and with or 
isotype-matched control IgG are also shown (green lines). Data are absorbance without fractionation as indicated. The percentage of total YLDY that was 

at 450 nmandare representative of three independent experiments. a, Binding sulfated is indicated for each sample. These data are representative of one 


to the anti-human PSGL-1 antibody KPL1. b, Binding to the anti-sialyl-Lewis X experiment. 
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Extended Data Fig. 6 | Further characterization of VISTA binding to PSGL-1. 
a-d, Isothermal titration calorimetry (ITC) measurements of the interaction 
between PSGL-Land VISTA. Top plots depict the raw calorimetric data of the 
titrations, and the bottom plots depict the integrated data corrected for the 
heat of dilution. The one set of sites model was used for data fitting. These data 
are representative of one experiment. a, Titration of 130 1M PSGL-1-Fc into 10 
ELM VISTA-Fc at 25 °C and pH 6.0. b, Titration of 130 uM PSGL-1-Fc into 10 tM 
VISTA-Fc at 37 °C and pH 6.0.¢, Titration of 200 1M PSGL-1-Fc into 10 pM VISTA- 
His at 25 °C and pH 6.0. d, Titration of 130 1M PSGL-1-Fc into 10 pM VISTA-Fc at 
25 °C and pH 7.4. The thermodynamic parameters determined by ITC are listed 
in Supplementary Table 2. e, Effects of PSGL-1-Fc (red circles) and P-selectin-Fc 
(blue squares) recombinant proteins on VISTA multimer binding to activated 
human CD4*T cells. Anon-binding antibody (black triangles) is included asa 
control. Data are VISTA multimer MFI normalized to control and are 
representative of two independent experiments. f, Effects of the indicated 
VISTA antibodies on PSGL-119-mer-Fc fusion protein Octet binding to VISTA-Fc 
fusion protein at pH 6.0. A non-VISTA-binding antibody is included asa control. 
Data are BLI binding magnitudes and are representative of two independent 
experiments. g, Effects of the PSGL-1 antibody KPL1 and recombinant P-selectin 
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on human PSGL-119-mer-Fc fusion protein Octet binding to VISTA-Fc fusion 
protein at pH 6.0. Anon-PSGL-1-binding antibody (control) and no added 
antibody are included as controls. Data are BLI binding magnitudes and are 
representative of two independent experiments. h, Effects of KPL1 (blue circles) 
onhuman VISTA-Fc binding to human PBMC monocytes at pH 6.0. Anisotype- 
matched non-PSGL-1-binding antibody is included as acontrol. Data are VISTA-Fc 
MFland are representative of two independent experiments. i, Percentage of 
Tcells with no PSGL-1 expression after CRISPR using guides against PSGL-1, CD4 
orascrambled control. These data are representative of five independent 
experiments. j, VISTA multimer (blue circle) binding to CHO cells expressing 
PSGL-Lat various pH. Non-VISTA-loaded multimer (black square) binding is 
included asa control. Data are VISTA multimer MFI normalized to control and 
are representative of two independent experiments. k, Human VISTA-Fc 
binding to wild-type and heparan sulfate-deficient CHO-K1 cells (red and orange, 
respectively) at pH 6.0. Isotype-matched control antibody binding to wild-type 
and heparan sulfate-deficient CHO-K1 cells (blue and green, respectively) at pH 
6.0 is also shown. These dataare representative of two independent 
experiments. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Other candidate VISTA receptors. a, BLI binding 
magnitudes for GPIBA-His (blue squares) and PSGL-119-mer-Fc (red circles) to 
captured VISTA-Fc at the indicated pH. These data are representative of one 
experiment. b, VISTA multimer binding histograms to human platelets. Binding 
was performed in the presence of non-VISTA-binding control antibodies (pH 7.4, 
blue; pH 6.0, red) or the VISTA.4 blocking antibody (pH 6.0, green). Unstained 
platelets (grey filled histogram) are included as controls. These data are 
representative of two independent experiments. c, BLI binding magnitudes for 
VSIG-3-Fc binding to captured VISTA-Fc at the indicated pH. These data are 
representative of two independent experiments. d, Left, anti-VISTA stained 
(red), control stained (blue) or unstained (black) parental HEK293 (top) and 
VISTA-expressing HEK293 cells (bottom). Right, VSIG-3-Fc (red) or control-Fc 
(blue) binding to the same cells at the indicated pH. These data are 
representative of two independent experiments. e, Left, anti-VSIG-3 stained 
(red), control stained (blue) or unstained (black) parental CHO (top) and VSIG-3- 
expressing CHO cells (bottom). Right, VISTA-Fc (red) or control-Fc (blue) 
binding tothe same cells at the indicated pH. These data are representative of 
two independent experiments. f, BLI binding magnitudes of VISTA-Fc binding to 


captured PSGL-119-mer-Fc at pH 6.0. Competition was provided at the indicated 
concentrations by anon-binding control antibody (grey bars), the VISTA 
blocking antibody VISTA.16 (yellow bars), the VISTA non-blocking antibody 
VISTA.5 (blue bars) or human VSIG-3-Fc fusion protein (purple bars). Darker bars 
depict the BLI binding magnitudes of competitors without VISTA. These data are 
representative of one experiment. g, BLI binding magnitudes of VSIG-3-Fc at the 
indicated concentrations binding to captured VISTA-Fc at pH 6.0. Competition 
was provided by buffer alone (blue bars), human PSGL-119-mer-Fc (green bars), 
non-binding isotype matched control antibody (red bars), VISTA.16 (yellow 
bars), VISTA.18 (purple bars), or VISTA.5 (orange bars). These data are 
representative of one experiment. h, VSIG-3-Fc binding to activated human 
PBMCT cells at pH 6.0 (green circles) or pH 7.4 (blue squares). Binding of 
isotype-matched control antibody at pH 6.0 (black diamond) and pH 7.4 (grey 
triangle) is included as acontrol. Data are VSIG-3-Fc MFI and are representative 
of two independent experiments. i, BLI binding magnitudes for VISTA-Fc (left) 
and PSGL-119-mer-Fc (right) binding to captured VISTA-Fc at the indicated pH. 
These data are representative of one experiment. 
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Extended Data Fig. 8 | Further characterization of the determinants of 
VISTA-PSGL-1 binding. a, BLI binding magnitudes of anti-PSGL-1 clone KPL1to 


captured total, sulfotyrosine-poor, and sulfotyrosine-rich fractions of PSGL-119- 


mer-Fc at pH 6.0 (green) and pH 7.4 (blue). These data are representative of one 
experiment. b, BLI binding magnitudes of wild-type PSGL-119-mer-Fc (WT, blue) 
and tyrosine to alanine mutant PSGL-119-mer-Fc (Y>A, green) to captured 
VISTA-Fc at the indicated pH. These data are representative of one experiment. 
c, BLI binding magnitudes for VISTA.5 (anon-blocking antibody, left) and 
VISTA.16 (a blocking antibody, right) binding to captured wild-type (WT, black), 
153/154/155 histidine to alanine mutant (H>A, red), 153/154/155 histidine to 
aspartic acid mutant (HD, blue) and 153/154/155 histidine to arginine mutant 
(H>R, green) VISTA-Fc proteins at pH 6.0 and pH 7.4. These data are 
representative of one experiment. d, SPR binding %R,,,,, values for VISTA.18 Fab 


binding to captured wild-type, H>D mutant, H>R mutant and H>A mutant VISTA 
proteins at pH 6.0 (left, green) and pH 7.4 (right, blue). Binding at 25,50 and100 
nMare indicated by light, medium and dark coloured bars respectively. These 
data are representative of one experiment. e, Wild-type and mutant VISTA-Fc 
binding to CHO-PSGL-1 cells at pH 6.0. Data are VISTA-Fc MFl and are 
representative of two independent experiments. f, Wild-type and mutant 
VISTA-Fc suppression of primary T cell NF-kB phosphorylation at pH 6.8. Data 
are pNF-kB MFI normalized to control. n=five T cell donors; these data are 
representative of two independent experiments. g, Additional human VISTA-Fc 
recombinant proteins were produced with the histidine residues at positions 98 
and 100, ora 98,100, 153, 154 and 155 (quintuple) replaced by arginine. Wild-type 
and H>R mutant VISTA-Fc binding to CHO-PSGL-1 cells at pH 7.4. Data are VISTA— 
Fc MFland are representative of two independent experiments. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Additional analyses of VISTA mouse models. a, 
VISTA.10 antibody blockade of mouse VISTA-Fc binding to activated mouse 


CD4* T cells at pH 6.0. Binding with no blocking antibody is included as acontrol. 


Data are mouse VISTA-Fc MFl and are representative of two independent 
experiments. b-d, MC38 tumour-bearing wild-type mice were treated with PD-1 
and VISTA blocking antibodies as described in Fig. 4. n=5 per group; these data 
are representative of three (b) or two (c, d) independent experiments. b, The 
frequency of intratumoural CD4*T cells seven days after the start of treatment. 
**P=0.0001.c, LAG-3 and TIM-3 MFI on intratumoral CD8° T cells. ***P< 0.0001. 
d, The frequencies of intratumoral leukocytes (CD45", first plot from the left), 
macrophages (CD11b*MHCII‘LyC’’LyG'), monocytes (CD11b*LyC"™®"LyG'™) and 
granulocytes (CD11b*LyC’Ly6G"®"), Per cent CD45", ***P=0.0001; per cent 
macrophages, **P= 0.0071; ***P= 0.0009. e, MC38 tumour-bearing VISTA- 
knockout mice and their wild-type littermates were treated with control or PD-1 
blocking antibodies. Frequencies of intratumoral CD8* (left) and CD4* (right) 
Tcells seven days after the start of treatment. Per cent CD8", *P=0.0285; per 


cent CD4*, *P=0.0330.n=4 (KO mice treated with anti-PD-1) or 5 mice per 
group; these data are representative of two independent experiments. 

f, Schematic of the endogenous (top) and humanized (bottom) VISTA sequence. 
g, Representative Southern blot of 4 heterozygous mice and 1 wild-type mouse 
(WT) for the humanized VISTA allele (7.1kb) and the endogenous Vista allele 
(6.1kb). These data are representative of three independent experiments. 

h, Expression of mouse and human VISTA on leukocytes from wild-type (blue) 
and homozygous human VISTA knock-in mice (red). Non-T,.¢ CD4* T cell 
(CD3*CD4*FoxP3 ), CD4* T,,., cell (CD3*CD4*FoxP3"), CD8*T cell (CD3*CD8*) and 
myeloid cell (CD3-B220 CD11b’*) subsets were assessed. Data are per cent VISTA- 
expressing and are representative of four independent experiments. i, Wild- 
type mice were treated with a single intravenous injection of 200 pg of an anti- 
mouse VISTA antibody (red downward triangles) or an isotype-matched control 
antibody (blue squares). Data are serum antibody concentrations and are 
representative of two independent experiments. Statistics depict 

mean +s.e.m. and one-way ANOVA with Dunnett’s multiple comparisons. 
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Extended Data Fig. 10 | Additional analyses of human VISTA antibodies in 
mice and macaques. a, Quantitative tissue biodistribution of fluorescently 
labelled VISTA.16 (left) and VISTA.18 (right) at 2.5 h (red), 24h (green) and51h 
(blue) after injection into MC38 tumour-bearing human VISTA knock-in mice. 
n=5(VISTA.16 at 51h) or 3 mice per group. Data are radiant efficiency mean+ 
s.e.m. and are representative of two independent experiments. b—e, Human and 
cynomolgus macaque sensorgrams for the antibodies VISTA.4 and VISTA.18 at 
pH 7.4 (blue, left) and pH 6.0 (red, right). These data are representative of two 
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independent experiments. 

b, Human VISTA sensorgrams for VISTA.4. c, Human VISTA sensorgrams 

for VISTA.18. d, Cynomolgus macaque VISTA sensorgrams for VISTA.4. 

e, Cynomolgus macaque VISTA sensorgrams for VISTA.18. f, MC38 tumour- 
bearing human VISTA knock-in mice were treated as described in Fig. 4. Tumour 
growth in mice treated with VISTA.16 only (left) or with VISTA.18 only (right). 
n=16 mice per group. Data are tumour volumes and area composite of two 
independent experiments. 
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Data collection Flow cytometry data were acquired with FACSDiva (BD, v 8.0.1) and with CytExpert (Beckman Coulter, v 2.1.0.92). 


Luciferase data were acquired with EnVision Manager (PerkinElmer, v 1.14.3049.528). 
Mass spectrometry data were acquired and analyzed with Byonic (Protein Metrics, v3.5). 
Octet data were acquired and analyzed with ForteBio (Molecular Devices, v 9.0.0.48). 
SPR data were acquired with Biacore T200 Control Software (GE Healthcare, v 2.0.2). 
ELISA data were acquired and analyzed by Softmax Pro (Molecular Devices, v 7). 

Crystal structure data were acquired and analyzed by HKL2000 (HKL Research, v 719). 
TC data were acquired with MicroCal Auto-iTC200 (Malvern Panalytical v 1.21). 

issue imaging data were acquired and analyzed with Living Image (PerkinElmer, v 4.5). 
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Data analysis Flow cytometry data were analyzed by FlowJo (BD, v 10.5.3). 


SPR data were analyzed with Biacore T200 Evaluation Software (GE Healthcare, v 3.1). 

TC data were analyzed by MicroCal PEAQ-ITC (Malvern Panalytical, v 1.21). 

ISTA : PSGL-1 structural modeling was performed with Maestro (Schrodinger, v 10.7.015). 
PK data were analyzed with Kinetica (ThermoFisher Scientific, v 5.0). 

Statistics were visualized and generated with Prism (GraphPad, v 8.0.2). 
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- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


Data generated or analyzed in this study are included in the manuscript. The VISTA crystal structure described in Figure 2 has been deposited with wwPDB and will 
be released upon publication. 
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Sample size No sample size calculations were performed. Sample sizes were modeled after those from past experience and publications. 
Data exclusions Data were only excluded from assay development and technically failed experiments. In mouse studies, mice that were not enrolled into a 
treatment group or who reached an endpoint other than study termination or tumor progression (such as morbidity or tumor ulceration) 


were excluded. 


Replication All technically successful replicate studies reproduced the indicated results. The number of replicates for each study is indicated. Some 
studies (eg, cynomolgus macaque pharmacokinetics and imaging studies) have only been performed once. 


Randomization — In mouse studies, mice were randomized at the start of treatment on the basis of tumor volume. 


Blinding Investigators were not blinded. Data reported for mouse studies (tumor measurements and flow cytometry) are non-subjective. 
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Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Antibodies 


Antibodies used Commercially sourced antibodies and other flow cytometry reagents: 
anti-mouse IgG, polyclonal, Alexa Fluor 647, 1:250, Jackson Immunoresearch 115-605-071 lot 131089 
anti-mouse CD3, 145-2C11, PE-Cy7, 1:200, Biolegend 100320 lot B268542 
anti-mouse CD4, GK1.5, Brilliant Violet 711, 1:400, Biolegend 100447 lot B245638 
anti-mouse CD8a, 53-6.7, Brilliant Violet 786, 1:400, Biolegend 100750 lot B273618 
anti-mouse CD11b, M1/70, PerCP-Cy5.5, 1:200, eBioscience 45-0112-82 lot 1929457 
anti-mouse CD19, 6D5, APC-Cy7, 1:400, Biolegend 115530 lot B253924 
anti-mouse CD45, 30-F11, Brilliant Violet 421, 1:500, Biolegend 103133 lot B263588 
anti-mouse F4/80, BMB8, Brilliant Violet 785, 1:200, Biolegend 123141 (lot unavailable) 
anti-mouse FoxP3, FJK-16s, FITC, 1:100, eBioscience 11-5773-82 lot 2007700 
anti-mouse Gri, Rb6-8C5, APC, 1:400, BD Biosciences 553129 lot 7121540 
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anti-mouse LAG-3, eBioC9B7W, PerCP-eFluor 710, 1:200, eBioscience 46-2231-82 lot E15914-105 
anti-mouse Ly6C, HK1.4, Brilliant Violet 711, 1:400, Biolegend 128037 lot B247973 

anti-mouse MHC-ll, M5/114.15.2, APC-Cy7, 1:200, Biolegend 107628 (lot unavailable) 

anti-mouse PD-1, J43, APC-eFluor 780, 1:200, eBioscience 47-9985-82 lot 1999029 

anti-mouse TIM-3, RMT3-23, PE-Cy7, 1:200, eBioscience 25-5870-82 lot 4342910 

anti-mouse VISTA, MIH63, PE, 1:200, Biolegend 150204 (lot unavailable) 

anti-human IgG, polyclonal, Alexa Fluor 647, 1:250, Jackson Immunoresearch 709-605-149 lot 129954 
anti-human CD3e, SK7, PE, 1:10, Biolegend 344806 lot B246230 

anti-human CD4, SK3, Brilliant Violet 650, 1:10, BD Biosciences 563875 lot 7048624 

anti-human CD8a, RPA-T8, Brilliant Violet 785, 1:10, Biolegend 301046 lot B221662 

anti-human CD14, M5E2, Brilliant Violet 421, 1:10, Biolegend 310830 lot B262218 

anti-human CD15, MMA, APC, 1:10, eBioscience 17-0158-42 (lot unavailable) 

anti-human CD19, HIB19, APC-R700, 1:10, BD Biosciences 564977 lot 7160751 

anti-human CD42b, HIP1, APC, 1:10, Biolegend 303912 (lot unavailable) 

anti-human CD56, NCAM 16.2, Brilliant Violet 421, 1:10, BD Biosciences 562752 lot 5295681 
anti-human NFkB pS529, K10-895.12.50, unconjugated, 1:100, BD Biosciences 558393 (lot unavailable) 
anti-human PSGL-1, KPL1, APC, 1:100, BD Biosciences 562758 lot 7125846 

anti-human VISTA, B7H5DS8, APC, 1:10, eBioscience 17-1088-42 (lot unavailable) 

anti-human VSIG-3, polyclonal, unconjugated, 1:100, R&D Systems AF4915 (lot unavailable) 

CellTrace Violet, ThermoFisher C34557 (lot unavailable) 
Live/Dead Fixable Aqua, ThermoFisher L34957 (lot unavailable) 
Streptavidin Dextramer, no clone, PE, 32 nM, Immudex DX01-PE (lot unavailable) 

human VISTA, no clone, biotinylated, 32-890 nM, Acro Biosystems B75-H82E1 (lot unavailable) 
human VISTA-Fc, no clone, unconjugated, 10 ug/mL, R&D Systems 7126-B7-050 (lot unavailable) 
mouse VISTA-Fc, no clone, unconjugated, 10 ug/mL, R&D Systems 7005-B7-050 (lot unavailable) 
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Internally generated antibodies: 
anti-human VISTA clones VISTA.4, VISTA.5, VISTA.16, and VISTA.18 
anti-mouse VISTA clone VISTA.5 


Validation VISTA.4 validation is depicted in Extended Data Fig 2F-H. 
VISTA.5 validation is depicted in Extended Data Fig 2F-H. 
VISTA.10 validation is depicted in Extended Data Fig 9A. 
VISTA.16 validation is depicted in Extended Data Fig 4D. 
VISTA.18 validation is depicted in Figure 2, Extended Data Fig 3D, 4D, 10B, and 10D. 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) 293T, Raji, Jurkat, MC38, and CHO cell lines were obtained from ATCC and modified as described in the methods. Expi293 
cells were obtained from Gibco. 


Authentication No cell line authentication was performed. 


Mycoplasma contamination All cell lines were screened for mycoplasma and found to be negative prior to use. MC38 tumor cells were additionally 
screened for other microbial contaminants and found to be negative prior to use in mouse studies. 


Commonly misidentified lines No commonly misidentified cell lines were used. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Mice used in the study were males and females aged 6-12 weeks. Wildtype (C57BL6), VISTA knockout, and human VISTA knock- 
in strains were used. Cynomolgus macaques were males 6-24 months of age. 


Wild animals This study did not involve wild animals. 
Field-collected samples This study did not involve field-collected samples. 
Ethics oversight Animal studies were conducted in compliance with all relevant ethical regulations. Animal studies performed at Bristol-Myers 


Squibb were approved by the Bristol-Myers Squibb Institutional Care and Animal Use Committee. Animal studies performed at 
Five Prime Therapeutics were approved by the Five Prime Therapeutics Institutional Care and Animal Use Committee. Animal 
studies performed at Murigenics were approved by the Bristol-Myers Squibb Animal Welfare Risk Assessment Team and by the 
Murigenics Institutional Care and Animal Use Committee. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics Human blood donors for experiments were anonymous, and no population information was collected. 
Recruitment Donors were anonymously recruited from Bristol-Myers Squibb employees. 
Ethics oversight Human blood was obtained from a research blood donation program administered by the Bristol-Myers Squibb Occupational 


Health and Wellness department. The program was operated in compliance with all relevant ethical regulations, and written 
informed consent was obtained from all donors. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 


A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 
Sample preparation Samples were prepared as listed in the methods. 
Instrument Fortessa (BD Biosciences) and Cytoflex (Beckman Coulter) instruments were used to acquire flow cytometry data. 
Software All flow cytometry data were analyzed with FlowJo software (BD Biosciences) 


Cell population abundance ___| Describe the abundance of the relevant cell populations within post-sort fractions, providing details on the purity of the samples 
and how it was determined. 


Gating strategy Describe the gating strategy used for all relevant experiments, specifying the preliminary FSC/SSC gates of the starting cell 
population, indicating where boundaries between "positive" and "negative" staining cell populations are defined. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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To safeguard genome integrity in response to DNA double-strand breaks (DSBs), 
mammalian cells mobilize the neighbouring chromatin to shield DNA ends against 
excessive resection that could undermine repair fidelity and cause damage to healthy 
chromosomes’. This form of genome surveillance is orchestrated by 53BP1, whose 
accumulation at DSBs triggers sequential recruitment of RIFl and the shieldin—CST- 
POLa complex’. How this pathway reflects and influences the three-dimensional 
nuclear architecture is not known. Here we use super-resolution microscopy to show 
that 53BP1 and RIF1 form an autonomous functional module that stabilizes three- 
dimensional chromatin topology at sites of DNA breakage. This process is initiated by 
accumulation of 53BP1 at regions of compact chromatin that colocalize with 
topologically associating domain (TAD) sequences, followed by recruitment of RIF1 to 
the boundaries between such domains. The alternating distribution of 53BP1 and RIF1 
stabilizes several neighbouring TAD-sized structures at a single DBS site into an 
ordered, circular arrangement. Depletion of 53BP1 or RIF1 (but not shieldin) disrupts 
this arrangement and leads to decompaction of DSB-flanking chromatin, reductionin 
interchromatin space, aberrant spreading of DNA repair proteins, and hyper-resection 


of DNA ends. Similar topological distortions are triggered by depletion of cohesin, 
which suggests that the maintenance of chromatin structure after DNA breakage 
involves basic mechanisms that shape three-dimensional nuclear organization. As 
topological stabilization of DSB-flanking chromatin is independent of DNA repair, we 
propose that, besides providing a structural scaffold to protect DNA ends against 
aberrant processing, 53BP1 and RIF1 safeguard epigenetic integrity at loci that are 
disrupted by DNA breakage. 


To study protection of DNA ends in the context of the 3D nuclear archi- 
tecture, we set out to visualize chromatin occupancy by 53BP1. Although 
a typical 53BP1 repair focus appears as a homogenous sphere under 
conventional microscopy, 3D structured illumination microscopy 
(3D-SIM)°** revealed an intrinsically organized compartment consist- 
ing of between four and seven 53BP1-labelled sub-domains assembled 
in an ordered, circular fashion around a central interchromatin space 
(Fig. 1a). Higher-resolution imaging by stimulated emission depletion 
(STED) microscopy’ refined that 53BP1 sub-domains span 60-180 nm 
with a centre-to-centre distance of approximately 140 nm (Extended 
Data Fig. la-e). We name these sub-domains 53BP1 nanodomains 
(53BP1-NDs) and their higher-order assembly 53BP1 microdomains 
(53BP1-MDs; Extended Data Fig. 1f). A similar chromatin arrangement 
was detected with different SIM instruments, reproduced using inde- 
pendent antibodies against 53BP1, and validated by visualization of 
endogenous 53BP1 tagged with GFP (Extended Data Fig. 1g-k). The 
53BP1 patterns mirrored those of phosphorylated H2AX (yH2AX) and 
overlapped with core histones (Extended Data Fig. 2a—c), consistent 


with the finding that DSB sites are organized in chromatin nanodo- 
mains®*. A typical 53BP1-MD assembled around one active site of 
DSB repair, exemplified by a single spot of XRCC4 involved in non- 
homologous end joining or RPA engaged in homology-directed repair 
(Fig. 1b, Extended Data Fig. 2d-f). 53BP1-MDs formed in both pre- and 
post-replicative chromatin (Fig. la—c), indicating that they represent 
a general response to DSBs. 

Whereas depletion of shieldin subunits (SHLD2, SHLD3) had no 
effect onthe 3D arrangement of 53BP1-decorated chromatin (Fig. 1c), 
depletion of RIF1 disrupted 53BP1-MDs into disordered and elongated 
shapes characterized by misaligned 53BP1-NDs (Fig. 1d). We quanti- 
fied this topological disruption using a custom-designed quantita- 
tive nanoscopy texture (QUANTEX) analysis tool, which revealed a 
significant increase in the mean breadth and principal axis length of 
53BP1-MDs (Fig. le, f, Extended Data Fig. 3a—c). The topological disrup- 
tion was reproduced by silencing R/F1 with multiple small interfering 
RNAs (siRNAs), by replacing endogenous 53BP1 with a mutant that 
cannot promote RIF1 recruitment”, and in several cancer-derived and 
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Fig. 1| DSBs are surrounded by 53BP1 nanodomains (53BP1-NDs) arranged 
into higher-order 53BP1 microdomains (53BP1-MDs) ina RIF1-dependent 
manner. a, 3D-SIM of GFP-53BP1-MDs in U20S cells exposed to irradiation (1Gy, 
2h).MCM, minichromosome maintenance protein. b, 3D-SIM of GFP-53BP1-MDs 
with immunostained XRCC4 (top) or RPA7O (bottom) in U2OS cells exposed to 
irradiation (IR; 1 Gy) for indicated times. c, d, 3D-SIM of immunostained 53BP1- 
MDs after siRNA-mediated depletion of SHLD2 (c, left), SHLD3 (c, right) and RIF1 
(d) in cells treated as ina. Insets ina, c, d are magnified 53BP1-MDs. e, QUANTEX 
analysis of mean breadth of 53BP1-MDs in cells treated as inc; n=40 per 
condition. f, QUANTEX analysis of mean breadth of 53BP1-MDs in cells treated as 
in d;n=60 per condition. Box plots: centre line, median; box limits, 25th and 
75th centiles; whiskers, minimum and maximum; dots, outliers. NS, not 
significant; P= 0.95, 0.51, 0.60, 0.50 (e, left to right). ****P=3.8003 x 10°’, 1.6698 
x10° (f, left to right). Two-tailed non-parametric Wilcoxon rank-sum test. Cell 
cycle stage was determined by MCM status (MCM‘ pre-replicative; MCM™ 
post-replicative). Scale bars, 5 um (a,c,d) and 200 nm (bandinsetsina, c,d). 
Experiments were biologically replicated twice with similar results. For detailed 
image information see Supplementary Table 1. 


non-cancerous cells (Extended Data Fig. 4a—e). Together, these data 
indicate that 53BP1 and RIF1 form an autonomous module in which 
RIF1is required to stabilize S3BP1-NDs into an ordered, circular chro- 
matin architecture (Extended Data Fig. 4f). In support of this concept, 
knockdown of TP53BP1 or RIF1 phenocopied each other by disrupt- 
ing YH2AX-marked chromatin into disordered and elongated shapes 
(Extended Data Fig. 4g-i). 

To study how 53BP1 and RIF1 cooperate to stabilize chromatin topol- 
ogy, we compared the localization of RIF with that of 53BP1. Whereas 
conventional microscopy indicates only a general proximity of 53BP1 
and RIF1at DSB sites, 3D-SIM and STED revealed that RIF1 localized to 
the chromatin boundaries between neighbouring 53BP1-NDs (Fig. 2a). 
To understand the purpose of this alternating localization, we tracked 
53BP1 dynamics from the pre- to post-damaged state using live-cell 
3D-SIM (live-SIM; Extended Data Fig. 5a). The first 5-10 min after DSB 
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Fig. 2| RIF1localizes to 53BP1-ND neighbourhoods to stabilize ordered and 
circular architecture of 53BP1-MDs after DNA breakage. a, GFP-53BP1-MDsin 
U20S cells exposed to irradiation (1 Gy, 2h), immunostained for RIFland 
acquired with conventional (widefield, confocal) or super-resolution (3D-SIM, 
2D-STED) microscopy. Pearson correlation coefficient (PCC) is 0.25 (n=270 
MDs) showing that colocalization of 53BP1 and RIF1 derived from 3D-SIM was 
low. b, Live-SIM recording of an evolving GFP-53BP1-MD at a single DSB induced 
by neocarzinostatin (NCS, 10 ng mI’). Manual classification of the main 
transition is colour-coded. c, Live-SIM as in bin cells depleted of RIF1.d, 
QUANTEX analysis of mean breadth of YH2AX-MDs in U2OS cells treated with the 
indicated siRNAs at the indicated times after irradiation (1 Gy);n=40 per 
condition. Box plots: centre line, median; box limits, 25th and 75th centiles; 
whiskers, minimum and maximum; dots, outliers. ****P= 8.2676 x10™, 

**P= 1.8363 x10, ****P= 1.9056 x 10°, NS (not significant) P= 0.7366 (left 
panel, left to right); *P= 0.0019, *P= 0.0059, ****P= 3.4337 x 10°, NS P=0.9264 
(right panel, left to right); two-tailed non-parametric Wilcoxon rank-sum test. 

e, QIBC analysis of recruitment of 53BP1 and RIF1 to DSBs in cells treated with 
irradiation (1 Gy) for the indicated times (n=500 cells per condition, data points 
are means of population). f, 3D-SIM of GFP-53BP1-MDs and immunostained RIF1 
in U2OS cells treated with irradiation (1 Gy) for the indicated times. Arrows 
indicate sites of RIF1 recruitment. All scale bars, 200 nm. Experiments were 
biologically replicated twice (a, d-f) or three times (b, c) with similar results. For 
detailed image information see Supplementary Table 1. 


generation were marked by loading of 53BP1 to DSB-flanking chromatin, 
aligned with previous findings obtained by conventional microscopy”. 
Inthe subsequent 5 min, the 53BP1 pattern matured into distinct 53BP1- 
NDs arranged around a central interchromatin space (Fig. 2b, Extended 
Data Fig. Sb). The dynamics of 53BP1 was mirrored by yH2AX and Halo- 
tagged histone H2B (Extended Data Fig. 6a-c), indicating that it was 
rooted in a chromatin template. Live-SIM analysis of RIF1-depleted 
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cells revealed that whereas the initial accumulation of 53BP1 was simi- 
lar to that in wild-type cells, 53BP1-NDs failed to mature to circular 
MDs (Fig. 2c, Extended Data Fig. 6d), leading to asphericity of repair 
foci (quantified as an increase in mean breadth of chromatin marked 
by yH2AX; Fig. 2d). As quantitative image-based cytometry (QIBC)” 
showed no major change in the levels of YH2AX or chromatin-bound 
53BP1 in RIF1-depleted cells, and the number of 53BP1-NDs was not 
altered when analysed by STED (Extended Data Fig. 7a—d), the likely 
cause of topological disruptions in RIF1-depleted cells was an inability to 
stabilize long-range chromatin interactions. Unexpectedly, while these 
data indicate that 53BP1 and RIF1 cooperate in shaping chromatin archi- 
tecture around DSBs, QIBC and laser microirradiation” independently 
revealed atemporal difference in their recruitment. In contrast to 53BP1, 
which was detectable immediately after DNA breakage, RIF1 became 
discernible only 10-15 min later, when 53BP1-decorated chromatin 


SHLD3 siRNA 
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Fig. 3 |53BP1-MDs comprise several TAD-sized chromatin domains whose 
ordered, circular arrangement protects integrity of DSB sites. a, b, 3D-SIM of 
the KIF23-TAD (a; n=15) and the KIF11-TAD (b; n= 41) labelled with the dual- 
colour FISH probes (FP-A and FP-B within one TAD; FP-C and FP-C intwo TADs); 
pie charts depict co-localization of the FP pairs with 53BP1-NDs. Other denotes 
infrequent arrangements. See Extended Data Fig. 8c, d for undamaged TADs. 
c,d, 3D-SIM of immunostained 53BP1in HCT116-RAD21-mAID-mClover cells 
untreated (c) or treated (d) with auxin for 6h. Insets are magnified 53BP1-MDs. 
e, 3D-SIM of GFP-53BP1-MDs in irradiated post-replicative U2OS cells (1 Gy, 2h), 
immunostained for BRCA1or RAP80. Localization frequency within 53BP1-MDs 
was 28% (n=100) for central BRCA1 (top), 54% (n=100) for peripheral BRCA1 
(middle), and 41% (n =85) for peripheral RAP80 (bottom). f, 3D-SIM asin top row 
of eafter RIF1 depletion. Frequency of aberrantly spread BRCA1 was 85% (n= 84). 
g, 3D-SIM of GFP-53BP1-MDs immunostained for RPA70 and treated as in 

e. Localization frequencies were 86% (n= 92) for focal RPA7O (top) and 66% 

(n= 61) for elongated RPA7O (bottom). h, 3D-SIM as ine after depletion of SHLD2 
or SHLD3. Frequency of increased but focal BRCA1 was 84% (n=119) for SHLD2 
depletion (top) and 73% (n= 82) for SHLD3 depletion (bottom). i, ChaiN analysis 
of 53BP1-MDs from wild-type cells or RIF1-depleted cells (n=150 per condition). 
Medians + 95% confidence intervals (Cl). **P= 0.0019, 0.0080, 0.0015 (classes 
1-3); NS, P=0.1400, 0.6288, 0.2885, 0.1681 (classes 4-7); two-tailed Student 
t-test. Scale bars, 200 nm (a, b, e-hand insets inc, d);5 um (c,d). Experiments in 
c-iwere biologically replicated twice with similar results. For detailed image 
information see Supplementary Table 1. 


started to mature into an ordered, circular arrangement (Fig. 2e, f, 
Extended Data Fig. 7e). Although the shieldin complex resembled RIF1 
by localizing to S3BP1-ND neighbourhoods (Extended Data Fig. 7f), 
disruption of shieldin did not impair the spatial arrangement of 53BP1- 
NDs (Fig. 1c, e). Thus, recruitment of RIF1 to DSB sites appears to have 
aunique role in stabilizing chromatin topology initiated by the forma- 
tion of S3BP1-NDs. 

To investigate how the chromatin arrangement at DSB sites influ- 
ences the general principles of 3D nuclear organization”, we used 
CRISPR-Cas$9 to introduce single DSBs into TADs that spanned cod- 
ing sequences for the essential mitotic regulators KIF23 and KIF11 
(Extended Data Fig. 8a, b). We then applied RASER-FISH* (resolution 
after single-strand exonuclease resection-fluorescence in situ hybridi- 
zation), a DNA hybridization technique that complements other TAD- 
scale approaches” by allowing the simultaneous detection of labelled 
FISH probes with super-resolution of immunolabelled proteins. While 
the labelled TADs showed a similar appearance regardless of DNA 
damage (Fig. 3a, b, Extended Data Fig. 8c, d), we noticed that the TAD 
signal in the guide-RNA-targeted loci appeared smaller than the size of 
the surrounding 53BP1-MDs (Extended Data Fig. 8e, f). Further inves- 
tigation using 3D-SIM revealed that the labelled K/F23-TAD sequence 
frequently overlapped with a single 53BP1-ND within a given 53BP1- 
MD (Fig. 3a). When the sequences of two neighbouring K/F11-TADs 
(one targeted by guide RNA and the other free of DNA damage) were 
labelled, the RASER-FISH signals colocalized with two distinct 53BP1- 
NDs (Fig. 3b, Extended Data Fig. 8g). Together, these data definea single 
53BP1-MD as a 3D multi-TAD assembly. To test whether the observed 
TAD-like chromatin partitioning might be linked to mechanisms that 
shape 3D nuclear architecture”, we knocked down cohesin subunits 
(RAD21 and SMC1) by siRNA in U20S cells or depleted RAD21 using 
an auxin-inducible degron in HCT116 cells. In all conditions, cohesin 
deficiency phenocopied RIFI knockdown by disrupting 53BP1-MDs 
into disordered, elongated shapes without changing the expression 
of 53BP1 or yH2AxX (Fig. 3c, d, Extended Data Fig. 9a-j). Thus, RIFland 
cohesin cooperate functionally to maintain chromatin topology at 
sites of DNA breakage. 

Disabling non-homologous end joining or homology-directed repair 
(by inhibiting DNA-PK or depleting CtIP, respectively) did not impair the 
formation of ordered and circular 53BP1-MDs (Extended Data Fig. 9k-m), 
raising the possibility that the 53BP1-initiated and RIF1-stabilized 
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topological arrangement of DSB-flanking chromatin operates as an 
autonomous 3D structural scaffold for repair reactions. To test this 
idea, we monitored the localization of BRCA1, a DNA-end processing 
regulator that counteracts the chromatin-embedded anti-resection 
barrier”. In wild-type settings, BRCAI1 was confined to focal compart- 
ments either inside or at the periphery of 53BP1-MDs (Fig. 3e). This 
dual localization is likely to reflect BRCA1 subcomplexes, as only the 
outer signal, but not the inner signal, could be recapitulated with 
RAP80 (a component of a BRCA1 sub-complex). In RIF1-depleted 
cells, BRCAI lost its focal appearance due to massive invasion into 
misshaped chromatin areas (Fig. 3f). This invasion was accompa- 
nied by conversion of the highly focal pattern of RPA into elongated 
structures, indicating excessive DSB resection (Fig. 3g). Whereas 
depletion of two independent shieldin subunits also increased the 
localization of BRCA1 to DSB sites, BRCA1 remained confined to foci 
and the 53BP1-MDs maintained their ordered, circular shape (Fig. 3h). 
To investigate whether the mislocalization of BRCAI in RIF1-deficient 
settings reflects alterations in the underlying chromatin, we quantified 
histone H2B-GFP occupancy by ChaiN (chain analysis of the in situ 
nucleome)”. Intensity-based segmentation of 3D-SIM images into 
seven discrete classes of H2B-GFP (Extended Data Fig. 10a), ranging 
from class 1 (interchromatin space) to class 7 (most compacted het- 
erochromatin), revealed that 53BP1-MDs featured a distinct distribu- 
tion of chromatin classes. Depletion of RIF1 led this distribution to 
move towards reduced interchromatin space (class 1) and increased 
chromatin decompaction (classes 2 and 3; Fig. 3i). As chromatin 
class distributions in undamaged chromatin remained unchanged 
in RIF1-depleted cells (Extended Data Fig. 10b), we conclude that 
RIF1-mediated enforcement of compact chromatin topology is 
confined to DSB sites. 

This study reveals a function for 53BP1 and RIF1in safeguarding the 
3D structure of genomic loci that have been disrupted by DNA break- 
age (Extended Data Fig. 10c). The ordered topology of DSB-flanking 
chromatin may function as a barrier to enzymes whose uncontrolled 
activity could cause collateral DNA and/or chromatin damage. The 
massive spreading of BRCA1across the topologically disordered chro- 
matin could be just the one example of structural disruptions that are 
unleashed in the absence of 53BP1 and RIF1. In addition, the compact 
structure of 53BP1-MDs might increase the local concentrations of 
limiting anti-resection factors such as shieldin, which are among the 
least abundant proteins in the human proteome”*” (Extended Data 
Fig. 10d). Moreover, stabilized chromatin topology could provide a3D 
scaffold for physiological DSBs, suchas in immunoglobulin diversifica- 
tion. The finding that 53BP1 and RIF1, but not shieldin, are required for 
long-range chromosomal transactions during immunoglobulin V(D)J 
recombination” is consistent with sucha scenario. Finally, as the topo- 
logical arrangement of DSB-flanking chromatin is independent of DNA 
repair, and shieldins are phylogenetically younger than the upstream 
components of the DNA-end protection pathway, we speculate that 
the 53BP1-RIF1 module might have primarily evolved to safeguard 
epigenetic information encrypted in the 3D chromatin structure that 
is challenged by DNA breakage. 
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Methods 


Cell culture 

Cells of the human retinal epithelial cell line hTERT-RPE1 (ATCC CRL- 
4000), BJ fibroblasts (ATCC CRL-2522), HeLa Kyoto cervical cancer cell 
line (obtained from S. Narumiya), and U2OS osteosarcoma cell line 
(obtained from Danish Cancer Society) were grown in DMEM contain- 
ing 10% heat-inactivated FBS and penicillin-streptomycin antibiotics. 
The following genetically modified cell line were used: U2OS cells stably 
expressing mouse 53BP1 N-terminally tagged to EGFP (GFP-53BP1)? 
(1 pg/ml puromycin), U2OS cells with endogenous 53BP1 C-terminally 
tagged with mEGFP (53BP1-GFP), U2OS cells expressing human 
GFP-53BP1-7A mutant (400 pg/ml geneticin), and U20S cells expressing 
GFP-53BP1 and H2B-HaloTag (1 pg/ml puromycin and 400 pg/ml 
geneticin), U2O0S-3xFlag-SHLD3 cells” (obtained from C. Choudhary), 
HeLa H2B-GFP cells (obtained from F. Barr) and human colorectal 
carcinoma HCT116 cells with an integrated RAD21 degron (RAD21- 
mAID-mClover)”’ (obtained from E. Lieberman Aiden). HCT116 cells 
were cultured in McCoy’s 5A modified medium with 10% FBS (100 pg/ml 
hygromycin and 100 pg/ml geneticin). Cells were tested for mycoplasma 
onaregular basis and authenticated by STR profiling (IdentiCell Molecular 
Diagnostics). 


Cell lines and plasmids generated for this study 

U20S GFP-53BP1-7A mutant cells were generated using plasmid 
pAc-GFP-human 53BP1-7A and selection of single clones according to 
procedures detailed previously. The plasmid was generated by cloning 
of aPCR fragment from Flag-tagged 53BP1-7A (a gift from A. Shibata) into 
vector pAc-GFP-Cl and rendered resistant to TP53BP1 siRNA (Ambion, 
$14313) using site-directed mutagenesis with the forward primer CTA- 
GAAGACCAGAAAGAGGGTCGCTCAACTAATAAGGAAAATCC. U20S 
GFP-53BP1/H2B-HaloTag cells were generated by transfecting the GFP- 
53BP1 cell line* with plasmid H2B-HaloTag and a selection of clones’. 
Plasmid pHTC-histone H2B-HaloTag was generated by cloning a PCR 
fragment of H2B from an existing H2B-GFP plasmid into the Nhel clon- 
ing site of PHCT HaloTag CMV-neo vector (Promega, G7711) to generate 
a C-terminal HaloTag. U2OS cells homozygously expressing C-terminally 
tagged 53BP1-GFP were generated using CRISPR-Cas9D10A mediated 
homology-directed repair”®: cells were transfected with two pX335- 
U6-Chimeric_BB-CBh-hSpCas9n (D10A) plasmids (Addgene plasmid 
#42335)” expressing Cas9D10A nickase and guide RNAs (antisense: 
AACACAATCTCCACGATAGC, sense: GIGTAACTGGATTCCTTGCA) and 
a donor plasmid containing mEGFP flanked by 900-bp homology arms 
complementary to the C terminus of 53BP1. After 7 days, GFP-positive 
cells were sorted by fluorescence-activated cell sorting (FACS; Sony 
SH800Z cell sorter) to obtain a heterozygous population. The homozy- 
gously tagged 53BP1-GFP U20S cell line was obtained by subcloning 
and validated by western blotting and junction PCR (forward primer: 
AAGCAGCACCATTCAAGTGC, reverse primer: TCTGGGCCT TCACCTAC- 
CTT) followed by Sanger sequencing. The functionality of 53BP1-GFP 
was tested using DNA damage response readouts. 


Generation of DNA breaks 

X-ray irradiation of cells was performed using aXYLON.SMART 160E-1.5 
device (160 kV, 6 mA) delivering 11.8 mGy/s. Soft X-rays were filtered 
using a3-mm aluminium filter (YXLON International A/S). For laser 
microirradiation-induced DNA damage”, cells were seeded on cov- 
erslips and treated with 5-bromo-2’-deoxyuridine (24 h, 10 uM, Sigma 
B9285). The coverslip was mounted on the stage of an inverted Zeiss 
Axio Observer microscope equipped with a CryLaS pulsed UV-A laser 
(355nm),a40x/0.6 objective and PALM-Robo software (Version 4.5.09, 
Carl Zeiss Microlmaging). Laser energy output was determined by bio- 
logical calibration. For temporal analysis, ten fields were irradiated for 
2.5min each along a straight-line pattern and after completion (25 min) 
the coverslip was immediately fixed in 4% formaldehyde. To generate 


site-specific DNA breaks, cells were transfected with gRNA-Cas9 ribo- 
nucleoprotein complexes using Lipofectamine CRISPRMAX Cas9 
(Invitrogen, CMAXO0008). CRISPR RNA and trans-activating CRISPR 
RNA were annealed according to the manufacturer’s instructions 
(Integrated DNA Technologies). For transfection of a35-mm dish (2 ml), 
6.25 pl Cas9 enzyme (TrueCut Cas9 V2, Invitrogen, A36496, 1 mg/mL) 
was diluted in 100 pl Opti-MEM medium followed by addition of 12.5 pl 
duplexed gRNA (2 1M) and 12.5 pI Plus-Reagent from the CRISPRMAX kit. 
CRISPRMAX reagent (7.5 ul) was diluted in 100 pl Opti- MEM mediumin 
a separate tube, mixed with the other components, incubated at room 
temperature for 15 min and added to cells. To induce DNA DSBs for 
live-SIM imaging, cells were treated with the radiomimetic neocarzi- 
nostatin (NCS) at a final concentration of 10 ng mI. 


Gene silencing by siRNA 

Transfection of siRNAs (Ambion Silencer Select) was performed with 
Lipofectamine RNAiMAX (Thermo Fisher Scientific, 13778075) at a 
concentration of 20 nM. siRNAs used targeted 7P53BP1 (#1 s14314, 
#2514313), RAD21 (S11726), RIF1 (#1830377, #2530378), SMCIA (#1815753, 
#2815751) and XRCC4(s14951). Unless stated otherwise, siRNAs #1 were 
used. siRNA against RBBP8”’ has been previously published. Ambion 
negative control #1 was used as control siRNA. 


Other treatment of cells 

The DNA-PK inhibitor NU7441 (Selleckchem) was used at 10 uM, 1h 
before irradiation. In order to induce RAD21 degradation in the RAD21- 
mAID-mClover cell line’, cells were treated with 500 uM of the auxin 
component 3-indoleacetic acid (IAA; Sigma, I2886). 


Antibodies for immunofluorescence detection and western 
blotting 

The following antibodies were used: 53BP1 (mouse, Millipore, MAB3802, 
1:750 for immunofluorescence (IF)), 53BP1 (rabbit, Novus Biologicals, 
NB100-305, 1:750 for IF, 1:1,000 for western blotting (WB)), 53BP1 (rabbit, 
Novus Biologicals, NB100-304, 1:1,000 for WB), BRCA1 (mouse, Calbio- 
chem, 092, 1:100 for IF), CtIP (mouse, Active Motif, 61141, 1:250 for WB), 
Flag-Tag (mouse, Sigma, F1804, 1:300 for IF), GFP (rabbit, Torrey Pines 
Biolabs, TP401, 1:1,000 for WB), H2AX phospho-S139 (mouse, Abcam, 
ab22551, 1:1,000 for IF), H2AX phospho-S139 (rabbit, Cell Signaling, 9733, 
1:1,000 for IF), HaloTag (mouse, Promega, G921A, 1:1,000 for WB), H2B 
(rabbit, Abcam, ab1790, 1:2,000 for WB), KAP1 (rabbit, Bethyl Labora- 
tories, A300-274A, 1:2,000 for WB), MCM2 (mouse, Novus Biologicals, 
H00004171-MO1, 1:200 for IF, 1:1,000 for WB), MCMS (rabbit, Abcam, 
ab17967, 1:200 for IF), MCM7 (mouse, Santa Cruz, sc-9966, 1:1,000 
for WB), MCMBP (rabbit, Novus Biologicals, NBP1-90746, 1:1,000 for 
WB), NUDC (rabbit, Sigma-Aldrich, HPAO27183, 1:1,000 for WB), RAD21 
(mouse, Millipore, 05-908, 1:500 for WB), RAP80 (Bethy! Laboratories, 
A300-764A, 1:400 for IF), RIF1 (rabbit, Bethyl Laboratories, A300-569A, 
1:500 for IF), RIF1 (rabbit, Cell Signaling, 95558, 1:500 for IF, 1:1,000 for 
WB), RPA7O (rabbit, Abcam, ab79398, 1:300 for IF), SMC1 (rabbit, Novus 
Biologicals, NBP2-67733, 1:1,000 for WB), tubulin (mouse, Santa Cruz, 
SC-8035, 1:500 for WB), XRCC4 (rabbit, Abcam, ab213729, 1:100 for IF). 
MCM2 (mouse monoclonal) and MCMS (rabbit polyclonal) antibodies 
were used to identify pre- and post-replicative cells. Secondary-antibody 
conjugates for immunofluorescence staining (IF) were goat anti-mouse 
and goat anti-rabbit Alexa Fluor 488 (A11029, A11034), Alexa Fluor 568 
(A11031, A11036) and Alexa Fluor 647 (A21236, A21245) reagents (Invitro- 
gen, highly cross-adsorbed). Secondary-antibody conjugates for STED 
were goat anti-mouse and anti-rabbit STAR RED (Abberior, 2-0002-011-2, 
2-0012-011-9) and goat anti-mouse and anti-rabbit STAR 580 (Abberior, 
2-0002-005-1, 2-0012-005-8). For imaging of fixed HeLa cells for H2B- 
GFP by 3D-SIM, a GFP booster was used (Chromotek, gba488, 1:200). 
For live-SIM, H2B-HaloTag-expressing cells were labelled with 200 
nM Janelia Fluor 585 HaloTag ligand (gift from L. Lavis) 20 min before 
image acquisition. 
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Western blotting 

Detection of proteins by western blotting was done using standard 
procedures and ECL-based chemiluminescence detection. For gel source 
data, see Supplementary Fig. 1. 


Immunofluorescence staining 

Procedure for standard IF has been previously described. IF for 3D-SIM 
was adapted from previously published protocols>*.. In brief, cells were 
grownon square 18 x 18-mm or 22 x 22-mm #1.5H high-precision cover- 
slips (Marienfeld Superior, thickness 0.170 + 0.005 mm), rinsed in PBS, 
pre-extracted, or not, in ice-cold 0.2% PBS-Triton-X for 1 min on ice, as 
indicated in Supplementary Table 1, and fixed in 4% formaldehyde for 
15 min. Primary and secondary antibodies were diluted in antibody 
diluent (DMEM medium containing 10% FBS and 0.05% sodium azide, 
filtered through a 0.2-ym filter). Coverslips were washed in distilled 
water and mounted on a 30-l drop of non-hardening Vectashield (Vec- 
torlabs, H-1000) or non-hardening Slowfade Diamond (Thermo Fisher 
Scientific, S36963). For DAPI staining, secondary antibody solution was 
supplemented with 4’,6’-diamidino-2-phenylindole-dyhydrochloride 
(DAPI, 0.5 pg/ml). 


FISH probes and labelling 

FISH probes (FP) were generated by labelling bacterial artificial chromo- 
somes (BACs, BACPAC Resources Center, https://bacpacresources.org/) 
with fluorescent dyes. To detect the TAD that harbours the K/F23 gene 
as annotated in the ensemble-annotated Hi-C resource at 10-kb resolu- 
tion (3D Genome Browser, YUE Laboratory, http://promoter.bx.psu. 
edu/hi-c/view.php), we used two adjacent FISH-BAC probes. KIF23 
FP-A is RP11-347N18, labelled with Alexa Fluor 647-aha-dUTP (A32763, 
Invitrogen); KIF 23 FP-B is RP11-1150H19, labelled with Alexa Fluor 594 
5-dUTP (C11400, Invitrogen), together spanning nearly the entire TAD 
(hg19:chr15: about 69300000-69750000). The FISH-BAC probe FP-C 
for detecting the TAD that harbours the K/F11 gene (hg19:chr10: about 
94250000-94650000) was BAC probe RP742C13, labelled with Alexa 
Fluor 647-aha-dUTP. The FISH-BAC probe FP-D for the adjacent TAD 
(hg19:chr10: about 94650000-95050000), was RP81C11, labelled with 
Alexa Fluor 594 5-dUTP. Comparison of these TADs in other cell lines and 
other data sets using the Compare Hi-C function of the YUE laboratory 
website (http://promoter.bx.psu.edu/hi-c/view.php) showed that they 
align across different cell lines and Hi-C resolution scales. BAC probes 
were directly labelled by nick translation as described previously. 


RASER-FISH 

RASER-FISH maintains nuclear fine-scale structure by replacing 
heat denaturation with exonuclease III digestion of one of the two 
DNA strands after UV-generation of nicks and is suitable for super- 
resolution image analysis. RASER-FISH was conducted as previously 
described” and here was combined with site-specific DSB generation 
and IF staining of 53BP1 to allow visualization of TADs at sites of dam- 
age. As acounterpart to TADs with DSBs, undamaged TADs (Extended 
Data Fig. 8c, d) were selected by absence of a 53BP1 signal in the volume. 
In brief, U2OS cells were seeded on 22 x 22 mm #1.5H high-precision 
coverslips (thickness 0.170 + 0.005) and labelled for 24 h with 10 uM 
BrdU/BrdC mix (3:1). Site-specific DSBs were induced by transfec- 
tion with gRNAs for K/F23 or KIF11 (Integrated DNA Technologies, 
Hs.Cas9.KIF23.1.AB; Hs.Cas9.KIF11.1.AA) as described above. Three 
hours after gRNA transfection, cells were fixed with 4% formaldehyde 
(prepared from 16% formaldehyde EM grade ampules) and stained 
for 53BPlas described above. After incubation with DAPI for UV sensi- 
tization (0.5 pg/ml, 15 min), cells were treated with UV light (254 nm, 
15 min) and exonuclease III (NEB, 5 U/l at 37 °C, 15 min). Labelled 
probes were denatured in hybridization mix (90 °C, 10 min) and pre- 
annealed with human Cot-1 DNA (Invitrogen, 37 °C, 15 min) and used 
for hybridization (39 °C, overnight). Coverslips were washed twice in 


1x SSC (37 °C, 30 min) and once in 1x SSC at room temperature. Cov- 
erslips were washed in PBS, post-fixed in 4% formaldehyde for 10 min, 
rinsed in PBS and MilliQ water and mounted in Slowfade Diamond. 


Microscopy and image analysis 

Detailed information on all images (imaging modalities, microscopy 
setups, fluorophores, image processing, display and analysis) can be 
found in Supplementary Table 1. Image acquisition for QIBC by high- 
content Widefield microscopy (ScanR Screening station, Olympus) 
was performedas previously described*”. Images were processed and 
analysed using ScanR analysis software (Olympus, 2.6.1). Metrics for 
the different objects (number and intensities of nuclei and foci) were 
quantified with single and calculated parameters. These values were 
then exported and visualized using TIBCO Spotfire desktop software 
(version 7.8.0). To visualize overlapping markers, low y-axis jittering was 
applied in scatter plots (random displacement of objects along y-axis). 
Confocal imaging was carried out ona LSM 880 microscope (Zeiss) or 
a UltraView Vox spinning disk system (Perkin Elmer). Super-resolution 
3D-SIM imaging was carried out following previously described proto- 
cols®, using an ELYRA PS.1 microscope system (Zeiss) and a DeltaVision 
OMX V3 Blaze system (GE Healthcare). Computational image reconstruc- 
tion for ELYRA PS.1 was done using theoretical optical transfer functions 
(OTFs) and the Zeiss algorithm (ZEN BLACK). For OMX V3 Blaze, raw 
data were reconstructed using channel-specific OTFs° (SoftWoRx 6.1). 
See Supplementary Table 1 for detailed description of imaging modali- 
ties, image processing and quality controls by SIMcheck”. Live-SIM 
was carried out on the DeltaVision OMX V3 Blaze system. Cells were 
seeded in 35-mm glass-bottom dishes (thickness 170 pm +5 pm; Ibidi), 
labelled with 200 nM Janelia Fluor 585 HaloTag ligand (gift from L. Lavis) 
20 min before image acquisition and washed in imaging medium (DMEM, 
Gibco 31053028). To induce DNA DSBs, cells were treated with NCS 
(10 ng mI"). Samples were imaged at 37 °C and 5% CO, using an Olympus 
60x/1.42 NA PlanApo N objective and RI 1.520 immersion oil. 3D-SIM 
stacks were acquired over a 0.875-m-thick (7 z-planes) nuclear mid- 
section to minimize bleaching. To increase throughput, 5-10 nuclei were 
marked per runand 15 rawimages per plane were acquired per time-point 
and position. The raw data were computationally reconstructed using 
SoftWoRx 6.1 (GE Healthcare) using channel-specific OTFs as specified 
in Supplementary Table 1. For analysis and display, only those examples 
were selected that could be tracked from before to after damage, stayed 
in focus and did not bleach more than 30% during the whole acquisition. 
STED imaging was performed on an Abberior STED and RESOLFT 775 
QUAD scanning microscope (Abberior Instruments GmbH) using the 
488 nm CWlaser and 594. nm, and 640 nm pulsed excitation lasers, and 
apulsed 775 nm STED laser for depletion using a100%/1.4 NA oil immer- 
sion objective and a 2D depletion donut for enhancing lateral resolution 
to approximately 50 nm. STED data were analysed and quantified using 
Fiji/Image)®. 


3D image analysis using in-house-developed QUANTEX software 

QUAntitative Nanoscopy TEXture analysis (QUANTEX) is a custom image 
analysis software tool with a graphical user interface, developed in 
Matlab(R2018a, MathworksInc.)toanalysecomplex3Dcellularstructures. 
The QUANTEX software, manual and webinar can be downloaded from 
https://figshare.com/s/46fa39d1010d77f51d9c. QUANTEX uses 3D 
slice-by-slice segmentation followed by connecting segmented com- 
ponents in 3D. Objects are segmented by processing and segmentation 
algorithms, morphology filtering and advanced watershed algorithms 
and analysed by original (in-house) and MathWorks algorithms for tex- 
ture, geometry and morphology features. For segmentation of nuclei, 
z-stacks were clipped to the minimum number of slices, smoothened 
by Gaussian filter blurring, and then underwent automated weighted 
Otsu-based segmentation. 53BP1-MDs were segmented in this order: 
nuclear background subtraction (Rolling Ball size 3), automated Otsu 
segmentation, morphology filtering (minimum object size 10 voxels). 


The parameter output of primary and secondary object features was 
exported as .xlsx document. The two main QUANTEX features used in 
this study are principal axis length and mean breadth. The principal axis 
length feature was implemented in QUANTEX from MathWorks (R2018a, 
MathWorks Inc.) and is a standard metric for the length of the major 
axis of an ellipsoid. Mean breadth is a metric from integral geometry 
and was implemented to QUANTEX from 

https://github.com/mattools/matImage/blob/master/matImage/im 
Minkowski/imMeanBreadth.m. The algorithm computes the integral 
of mean curvature as a Minkowski measure that is estimated from the 
Crofton formula (see detailed information in the QUANTEX manual and 
webinar; https://figshare.com/s/46fa39d1010d77f51d9c). Steps for cal- 
culating mean breadth from 3D binary object are as follows: i) Calculate 
the number of voxels within the object (nv); ii) Calculate the number 
of connected components in three main direction x, y, and z(ncx, ncy, 
ncz); iii) Calculate the number of square faces on the plane with normal 
direction x, yand z (nfx, nfy, and nfz); iv) Calculate mean breadth (MB) 
inx direction MBx = nv - (ncy + ncz) + nfx, y direction MBy = nv - (ncx + 
ncz) + nfy, z direction MBz=nv - (ncx + ncy) + nfz; Mean breadth of an 
object = (MBx + MBy + MBz)/3. Principal axis length and mean breadth 
each measure the maximum linear dimension of 3D objects. Both meas- 
ures consistently give significant P values and robustly discriminate 
between globular and elongated 53BP1-MDs. A Spearman’s correlation 
score (test of association between both measures) of R?=0.59 (Extended 
Data Fig. 3c) shows that they carry similar but not identical information: 
59% of variation in mean breadth is explained by principal axis length and 
41% of variation in mean breadth is independent of the latter. Wilcoxon 
tests show that mean breadth more robustly discriminates between 
globular and elongated shapes of 53BP1-MDs and is less susceptible 
to geometrical outliers; for these reasons, it was chosen as the main 
measure in this study. 


Image analysis for ChaiN method 

This image analysis pipeline was used to extract chromatin density 
distribution within 53BP1-MDs in an automated manner”. Recon- 
structed and aligned multichannel 3D-SIM micrographs of chromatin 
and 53BP1-MDs were split into their single channel components and 
53BP1-MDs were thresholded by Otsu algorithm and by size exclusion 
(excluding signal from antibody noise). The H2B chromatin channel 
was segmented into seven arbitrary classes implementing a hidden 
Markov model (HMM), where class 1 denotes no detectable chromatin 
(interchromatin compartment, IC), and classes 2-7 denote increasing 
levels of chromatin compaction. The 53BP1-MD volumes are used to 
mask the segmented chromatin, giving the distribution of chromatin 
density within these volumes. Aggregating these distributions over 
all sub-volumes for all images yields an average distribution for each 
density class as a percentage within class-specific statistical confidence 
ranges. As a control, the whole nuclear volume can also be taken to 
analyse whether the chromatin distribution changes genome-wide, 
outside 53BP1-MDs. This workflow runs on free and open source soft- 
ware (Octave and R). Scripts used can be found at https://github.com/ 
ezemiron/Chain. 


RNA sequencing data source 

RNA sequencing (RNA-seq) data for 7P53BP1, RIF1 and SHLD1 transcripts 
were derived from publicly available RNA-seq data sets at EMBL-EBI 
expression atlas (https://www.ebi.ac.uk/gxa/home). Original data 
sources are: NIH Genomic Data Commons Cell lines CCLE osteosarcoma 
(U20S), Sanger Genomics of Drug Sensitivity in Cancer Project GDSC 
Cancer Genome Project uterine cervix/cervical carcinoma (HeLa #1), 675 
Genentech uterine cervix/cervical adenocarcinoma (HeLa #2), RNA-seq 
of long poly-adenylated RNA and long non-polyadenylated RNA from 
ENCODE cell lines/total RNA/whole cell (IMR90) and Genentech RNA 
seq of 675 commonly used human cancer cell lines (HBL100, breast, 
normal at time of derivation). 


Statistics and reproducibility 

The two-tailed Student’s t-test was used to test Gaussian distributed per- 
class data in ChaiN analysis. The two-tailed non-parametric Wilcoxon 
rank-sum test for equal medians was used for all data underlying box 
plots except in Extended Data Fig. 7d. Here, the Cochran-Armitage 
chi-square test was applied to compare the frequency distribution of 
an ordinal variable between different conditions. Spearman’s correla- 
tion coefficients and their R? values were calculated for metrics mean 
breadth and principal axis length derived from control (negative class) 
and RIF1 depletion data (positive class) and combined in order to test 
the association between the metrics. The Pearson correlation coef- 
ficient was used to quantify the degree of colocalization between two 
fluorophores. Experiments were not randomized and no blinding was 
used during data analysis. Sample size was not pre-determined. Sample 
size, statistical tests and the number of biological replicates for each 
experiment are indicated in the figure legends. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Code availability 


Custom ChaiN code is available at https://github.com/ezemiron/Chain. 
Custom QUANTEX code is available from the corresponding author 
upon reasonable request. 


Data availability 


Numerical and statistical source data for Figs. le, f, 2d, e, 3a, b, e-iand 
Extended Data Figs. 1d, e, 2c, e, f, 3b, 4d, 5b, 6c, d, 7a, b, d, 8c, d, 9c, 10b,d 
are provided online. Primary imaging data underlying widefield, confo- 
cal, SIM and STED images in Figs. la—d, 2a—c, f,3a—h and Extended Data 
Figs. Ic, i-k, 2a—c, 4b, c, e, f,h, i, 5b, 6a,c, d, 7c, e, f, 8b-g, 9b, f-h, k, lhave 
been deposited at the European Bioinformatics Institute (EBI) BioStudies 
database (https://www.ebi.ac.uk/biostudies/) with accession number 
S-BSST275. Processed imaging data sets underlying QIBC, QUANTEX, 
ChaiN and other analysis, including guidance on how to navigate data 
sets, are available from the corresponding authors. There are no restric- 
tions on data availability. 
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Extended Data Fig. 1| Spatial features of 53BP1-MDs at sites of DNA 
breakage. a, Experimentally derived resolution for STED and 3D-SIM 
instruments using nano-bead imaging under identical conditions as for image 
data acquisition at the indicated excitation wavelengths. Line profile is average 
of three lines; dotted line shows fit of a double Gaussian distribution, where the 
peak-to-peak distance indicates spatial resolution. b, Western blot of GFP-53BP1 
U20S cells immunostained for 53BP1, GFP and loading controls (NUDC, tubulin). 
c,3D-SIM and STED images of immunostained 53BP1-MDs in U2OS cells exposed 
to irradiation (1 Gy, 2h). Images were processed identically for pixel numbers 
and bicubic interpolation smoothing for direct comparison. d, Diameter of a 
53BP1-ND in pre-and post-replicative cells determined by full width at half 
maximum (FWHM, n=75) from STED data inc. e, Centre-to-centre peak distance 
(n=85) of 53BP1-NDs from STED data inc. Box plots: centre line, median; box 


limits, 25th and 75th centiles; whiskers, minimum and maxiumum; dots, outliers. 
*P=0.0356 (d), P=0.8587 (e; NS, not significant); two-tailed non-parametric 
Wilcoxon rank-sum test. Pre- or post-replicative chromatin assigned based on 
MCM‘ or MCM status. f, Schematic depiction of 53BP1-MD. g, Western blot of 
U20S cells with endogenously tagged 53BP1-GFP immunostained for 53BP1, 
GFP and loading control (MCM2). h, Junction PCR showing homozygous 53BP1 
tagging. i-k, 3D-SIM of 53BP1 MDs in endogenously tagged U20S-53BP1-GFP 
cells (i) or U20S cells immunostained with mouse (§) or rabbit (k) 53BP1 
antibodies, exposed to irradiation (1 Gy, 2h). Scale bars, 100 nm (a);200nm 

(c, i-k). Experiments inb, d, e, g-k were biologically replicated twice with similar 
results. For detailed image information see Supplementary Table 1. For gel 
source data see Supplementary Fig. 1. 


z=51 
—yH2AX 
——H2B 
—DAPI 


€£0 
s90 
“£s0 
6v'0 
Tv0 
zE0 
vz'0 
9T0 
80°0 
000 


2=43 


€L0 
S90 
4s°0 
6r'0 
Tyo 
ze0 
vz'0 
90 
80°0 
000 


antyeol|del-Jsoq 


aaneoi|del-a1q 
es | es | 


x 
ft 
N 
ag 
c 
; 
x 7 
a 
o 
wo 


—=yH2AX 


€L0 
S90 
4s°0 
6v'0 
Tyo 
zE0 
vz0 
90 


yH2AX 


a 
=< 
a 
re 
rea] 
N 
x 
+ 
x< 
xt 
A 
. 
= 


53BP1 


c 


a 


(ny) Aysuajul souedseson|4 


Distance (um) 


To 


0 
1 


53BP1 + RPA 


53BP1 + XRCC4 


XRCC4 


siRNA 


: 
vOOYX 
joujuoD 


(ny) Aysuayul esousdsason|4 


+28°0 
ylL0 
24190 
08s'0 
vero 
Z8¢0 
062'0 
e610 
460°0 
000°0 


+280 
ylL0 
219°0 
08s'0 
ver'0 
28€'0 
0620 
€61'0 
260'0 
000'0 


(nv) Ajisuayu!l eousosaon|4 


+280 
lO 
4190 
08s'0 
vero 
80 
0620 
£610 
460°0 
000°0 


ao 
o 


< 


“= ae 
oo 


i! 


= 
ro) 


Ba 
o 


+28°0 
lO 
219'°0 
08s'0 
v8r'0 
L800 
0620 
£610 
460°0 
000°0 


° 


ost 9 
ooo 


Distance (um) 


Distance (um) 


Extended Data Fig. 2|See next page for caption. 


Article 


Extended Data Fig. 2|53BP1-MD relation to underlying chromatin. a, 3D-SIM 
of GFP-53BP1-MDs in U20S cells exposed to irradiation (1 Gy, 2h) and 
immunostained for yH2AX (PCC = 0.93, n= 300 MDs) shows high colocalization 
of 53BP1and yH2AX.b, STED of a yH2AX-MD in U20S cells treated as ina. 
c,3D-SIM of three different z-planes of HeLa cells expressing histone H2B-GFP, 
treated with 10 ng mI“ NCS for 2h and immunostained for YH2AX. Nuclear DNA 
was visualized by DAPI. Insets are magnified yH2AX-MDs. Intensity line profiles 
of the three fluorophores (along the white line in the insets) show colocalization 
of chromatin with YH2AX-MDs. d, Western blotting of U2OS cells treated with 


control or XRCC4 siRNA immunostained for XRCC4 and loading marker (KAP1). 
e, f, Intensity line profiles of 53BP1-MDs with XRCC4 (e) and RPA (f) in cells 
treated as in Fig. 1b; six independent examples per condition are shown. 
Fluorescence intensities inc, e, fwere normalized to the maximum value of each 
profile. Scale bars, 200 nmina, b, insets inc and 5 min whole-nucleus images 
(c). Experiments were biologically replicated twice with similar results. For 
detailed image information see Supplementary Table 1. For gel source data see 
Supplementary Fig. 1. 
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experiments as in Fig. 1a, dand represent a parallel data analysis to the metric 
mean breadth in Fig. 1f; n= 60. Box plots: centre lines, medians; box limits, 25th 
and 75th centiles; whiskers, minimum and maximum; dots, outliers. 

**P= 9.4329 x 10° (left), 2.3092 x 10° (right); two-tailed non-parametric 
Wilcoxon rank-sum test. The experiment was biologically replicated twice with 
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Extended Data Fig. 4 | See next page for caption. 


Extended Data Fig. 4 | Disruption of ordered, circular arrangement of DSB- 
flanking chromatin after depletion of RIF1 or 53BP1. a, Western blotting of 
U20S cells treated with control or two RIFI siRNAs immunostained for RIFland 
loading marker (tubulin). b, 3D-SIM of GFP-53BP1-MDs in U2OS cells transfected 
with R/F1 siRNA #2 and treated as in Fig. 1d. c, 3D-SIM of 53BP1-MDs in U20S cells 
expressing siRNA-resistant GFP-53BP1-7A mutant and depleted for endogenous 
53BP1 (#2 siRNA), exposed to irradiation (1 Gy, 2h) (top). Aschematic depiction 
of 53BP1-7A where glutamines in 7 SQ/TQ sites are converted to alanines 
(bottom). d, Distribution of circular with central interchromatin space 

(IC centre) versus elongated (noIC centre) 53BP1-MDs in U20S, HeLa Kyoto, 
RPE1-hTERT and BJ cells (2 =130 per condition) in control or RIF1-depleted cells 
treated with irradiation (1 Gy, 2h). e, 3D-SIM of immunostained 53BP1-MDs in 
U20S, HeLa Kyoto, RPE1-hTERT and BJ cells after control or RIF1 depletion and 
irradiation exposure (1Gy, 2h). f, Arepresentative 3D view of an ordered, circular 


arrangement of GFP-53BP1-NDs in wild-type conditions (top) and disordered, 
elongated shapes after RIF1 depletion (bottom). MIP, maximal intensity 
projection; 3D opacity view is displayed in three orientations (V1-3) indicated by 
coloured arrows. All 3D-SIM images in this study were routinely inspected in this 
way. g, Western blotting of U2OS cells treated with 7P53BPI siRNA and 
immunostained for 53BP1 and loading marker (NUDC). h, 3D-SIM of YH2AX-MDs 
in U20S cells transfected with TP53BPI siRNA and exposed to irradiation (1Gy, 
2h).i, 3D-SIM of GFP-53BP1-MD in U20S cells immunostained for yH2AX and 
treated as in Fig. 1d. Insets inb,c, hrepresent magnified single 53BP1-MDs. Scale 
bars, 5 um in whole-nucleus images (b, c,h), 200 nm ine, f,iandinsets (b,c, h). 
Experiments were biologically replicated twice with similar results. For detailed 
image information see Supplementary Table 1. For gel source data see 
Supplementary Fig. 1. 
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» 
ion 


GFP-53BP1 
Halo-H2B 


= 
a 
ao 
oO 
v 
a 
L 
(0) 


Halo-H2B 
U20S 


5 min 10 min 15 min 30 min 2h 


~<— GFP-53BP1 
~<— 53BP1 


EI 


- © 
jo} 
° (8) 
R 
x= 
a x 
x 
8 oo 
D oN 
on tw 
QO a20 
a Loa 
=) (OES 
c 
53BP1 + H2B t= 10 min 
= 1 1 
=) 
= 0.8 0.8 
r=) 
@ 0.6 0.6 
sg —Halo-H2B 04 —~Halo-H2B 04 —Halo-H2B 
= —GFP-53BP1 —GFP-53BP1 | | ~ —GFP-53BP1 
c 0.2 0.2 
Oo 
9 0 0 
8 SnoLgranera Smnonngraneara 
2 SSlASzeEsrs SSLRSEIESRS 
8 ScSdccoGdcGG SScddcccdGGdG 
ri 
Distance (um) 
d 


2.5 


No damage 


| Loading / expansion 


Bl Misalignment 


5 7.5 10 12.5 
> 


Relative frequency of 
misalignment into disordered, 
elongated shapes= 80 % 


n (cells) = 10 
n (53BP1-MDs) = 59 


Live 3D-SIM of 53BP1 (RIF1 siRNA) 
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Extended Data Fig. 6 | Live-SIM imaging of 53BP1-MDs with the underlying 
chromatin and after RIF1 depletion. a, 3D-SIM of immunostained YH2AX-MDs 
in control or 53BP1-depleted U20S cells treated with irradiation (1 Gy) for the 
indicated times. b, Western blotting of U2OS cells expressing GFP-53BP1 and 
H2B-Halo-Tag immunostained for 53BP1, GFP, H2B, Halo-Tag and loading marker 
(MCMBP).c, Live-SIM depicting an evolving GFP-53BP1-MD at a single H2B- 
HaloTag-labelled chromatin locus after induction of DSBs by NCS (10 ng mI’) for 
the indicated time-points. Insets are magnified 53BP1-MDs. Intensity line 
profiles of the two fluorophores (along the white lines in the insets) show 


colocalization of underlying chromatin with the 53BP1-MD. Fluorescence 
intensities were normalized to the maximum value of each profile. d, Additional 
examples of live-SIM of cells treated as in Fig. 2c. Image galleries for seven fields 
from four independent acquisitions are displayed. Manual classification of 
transition stages is colour-coded. Experiments in a—c were biologically 
replicated twice with similar results. Scale bars, 200 nm ina, d, and insets inc; 
1pm in large fields inc. For detailed image information see Supplementary 
Table 1. For gel source data see Supplementary Fig. 1. 
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Extended Data Fig. 7 | Analysis of RIF1 depletion, shieldin localization, and 
RIF1 recruitment dynamics in the context of DSB-flanking chromatin. 

a,b, QIBC of fluorescence intensities associated with YH2AX-MDs (a;n=1,000 
cells per condition) and 53BP1-MDs (b; n=1,800 cells per condition) in control or 
RIF1-depleted cells treated with irradiation (1 Gy) as indicated. Box plots: centre 
line, median; box limits, 25th and 75th centiles; whiskers, minimum and 
maximum; dots, outliers. ***P=2.0631 x 10 °°, **P=4.8803 x 10 °*, NS P=0.8651 
(a, left to right); ***P= 3.887 x 10°, NS P= 0.7172 (b, left to right); two-tailed non- 
parametric Wilcoxon rank-sum test. c, Confocal and STED acquisitions of 
immunostained 53BP1-MDs in U20S cells treated with control or RIF1 siRNAs, 
exposed to irradiation (1 Gy, 2h) and displayed as single and overlay images. 


d, Counts of 53BP1-NDs per 53BP1-MD quantified from STED images inc (n=70 
per condition); horizonal bar shows median, P= 0.2711 (left), 0.9566 (right); 
Cochran-Armitage chi-square test. e, U2OS cells expressing endogenously 
tagged 53BP1-GFP were treated by laser microirradiation and immunostained 
for YH2AX and RIF1. Stars indicate times when yH2AX, 53BP1 and RIF1 were first 
detected at DSBs. f, 3D-SIM of S3BP1-MD and 3x-Flag-SHLD3 in U2OS cells 
exposed to irradiation (1 Gy, 2h) and immunostained for 53BP1 and Flag-tag (six 
independent examples are shown). Scale bars, 200 nm (c, f), 20 1m (e). 
Experiments were biologically replicated twice with similar results. For detailed 
image information see Supplementary Table 1. For gel source data see 
Supplementary Fig. 1. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 |RASER-FISH analysis of 53BP1-MDsat site-specific 
DSBs in KIF23 and KIF11loci.a, Depiction of a0.45-Mb TAD froma reference 
cell line (adapted from Yue laboratory 3D genome browser, see Methods) 
harbouring the K/F23 gene (top) anda 0.4-Mb TAD harbouring the K/F11 gene 


(bottom). Sites of CRISPR-Cas9 site-specific DSBs anda position of each RASER- 


FISH probe are indicated. b, DAPI-stained U20S cells transfected with Cas9 
ribonucleoprotein complexes with control, K/F23-, or K/F11-targeting guide 
RNAs (gRNA). Arrows indicate examples of mitotic aberrations inflicted by 
KIF23 and KIF11 knockout. c, d, 3D-SIM of the KIF23-TAD (c) and the K/F11-TAD (d) 
RASER-FISH probes in cells treated as in Fig. 3a, b but at loci without DNA 
damage (no 53BP1 signal). Dual-colour FISH probes FP-A and FP-B are located 
within the same TAD inc; FP-C and FP-D in in two adjacent TADs ind. e, Widefield 


microscopy of immunostained 53BP1-MDs at the damaged KIF23-TAD locus 
labelled by FP-B in U2OS and RPE1-hTERT cells 3 hafter transfection with KIF23 
gRNA-Cas9. Insets (MD1-3) are magnified 53BP1-MDs shown inxy, xzand yz 
orientations. f, Widefield microscopy of immunostained 53BP1-MDs at the 
damaged KIF11-TAD locus labelled by FP-C in U2OS cells 3 hafter transfection 
with KIF11 gRNA-Cas9. Insets (MD1-3) were generated as ine. g, 3D-isosurface 
projections (V1-3) of 3D-SIM images of FP-C- and FP-D-labelled K/F11 TADs after 
DNA damage induction as shown in Fig. 3b. Scale bars, 5 um in whole-nucleus 
images (e, f),200 nm in insets (e, f) andinc, d, 20 pminb. Experiments inb-f 
were biologically replicated twice with similar results. For detailed image 
information see Supplementary Table 1. 
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Extended Data Fig. 9 | Disruption of ordered, circular arrangement of DSB- 
flanking chromatin after cohesin depletion. a, Western blotting of HCT116- 
RAD21-mAID-mClover cells treated with auxin (aux) as indicated and 
immunostained for RAD21and loading marker (NUDC). b, Widefield images of 
HCT116-RAD21-mAID-mClover cells, either untreated or treated with auxin for 
6hto induce RAD21 degradation. c, QUANTEX analysis of mean breadth of 
53BP1-MDs in cells treated as in Fig. 3c, d (n=110). Box plots: centre line, median; 
box limits, 25th and 75th centiles; whiskers, minimum and maximum; dots, 
outliers. ****P=3.8495 x 10” for MCM*’, ****P= 7.636 x 10 for MCM ; two-tailed 
non-parametric Wilcoxon rank-sum test. d, Western blotting of U2OS cells 
treated with control or RAD21 siRNA, immunostained for RAD21 and loading 
marker (tubulin). e, Western blotting of U2OS cells treated with control or SMC1 
siRNA, immunostained for SMC1and loading marker (MCMBP). f-h, 3D-SIM of 


GFP-53BP1-MDs in U2OS cells transfected with RAD21 siRNA (f), SMCI siRNA #1 
(g), or SMC1 siRNA #2 (h) and exposed to irradiation (1 Gy, 2h). i, Western 
blotting of U2OS cells treated with the indicated siRNAs and immunostained for 
yH2Ax; total protein stain is loading control.j, Western blotting of U2OS cells 
treated with indicated siRNAs and immunostained for 53BP1 and loading marker 
(MCM7).k, I, 3D-SIM of GFP-53BP1-MDs in U20S cells treated with 10 LM DNA- 
PK inhibitor (k) or RBBP8 (also knownas Ct/P) siRNA (I) and exposed to 
irradiation (1 Gy, 2h). m, Western blotting of U2OS cells treated with control or 
RBBP8 siRNA, immunostained for CtIP and loading marker (NUDC). Insetsin 
(f-h, k, ]) are magnified 53BP1-MDs. Scale bars, 5 um in whole nuclei (f-h,k, 1), 
200 nm ininsets (f-h, k, I), 20 pm in b. Experiments were biologically replicated 
twice with similar results. For detailed image information see Supplementary 
Table 1. For gel source data see Supplementary Fig. 1. 
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Extended Data Fig. 10 | Chromatin density analysis by ChaiN, RNA-seq data, 
and aschematic model for topological surveillance of DSB loci. a, Schematic 
depiction of ChaiN analysis to quantify chromatin density in 3D-SIM images 
based on histone H2B-GFP distribution. Reconstructed and aligned 3D-SIM 
images were used to segment volumes occupied by 53BP1-MDs and subjected to 
an HMM process to derive seven discrete GFP-H2B chromatin density classes 
within the segmented region. Class 1 represents chromatin-free interchromatin 
space, while class 2-7 feature increasing chromatin densities. An equivalent 
analysis of the whole nucleus serves as a control for global chromatin 
distributions outside 53BP1-MDs. b, ChaiN analysis in undamaged nuclei in wild- 
type or RIF1-depleted cells (n=12 per condition). Median + 95% CI. *P=0.0348, 
0.0226 (class 2 and 4), NS P= 0.2525, 0.7373, 0.0990, 0.4874, 0.9496 (classes 1, 3, 
5-7); two-tailed Student’s t-test. c, A hypothetical model. A DSB triggers 
accumulation of 53BP1in the damaged and several neighbouring chromatin 


nanodomains. Saturation of 53BP1 at chromatin nanodomains prompts 
recruitment of RIF1 to the boundaries between them. Through functional 
crosstalk with cohesin, RIF1 locally stabilizes the nanodomain topology into an 
ordered and circular microdomain, which confines repair factors suchas BRCA1 
to DSBs and locally concentrates shieldin-CST-POLa to restrain DNA-end 
resection. Absence of RIF1leads to topological disorder that results in excessive 
spreading of BRCAI, inability to concentrate DNA-end protection factors and 
DSB hyper-resection. d, RNA-seq data for 7P53BP1, RIF1 and SHLD1 transcripts 
per million kilobases in cancerous cells (U2OS, HeLa) and normal cells (IMR90, 
HBL100). Data were derived from publicly available RNA-seq data in the EMBL- 
EBI expression atlas (see Methods). Scale bars (a), 5 um inthe whole nucleus and 
200 nm inthe magnified 53BP1-MD image (right). For detailed image 
information see Supplementary Table 1. 
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Statistical parameters 


When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
text, or Methods section). 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 


Our web collection on statistics for biologists may be useful. 


Software and code 


Policy information about availability of computer code 


Data collection ScanR Acquisition software (Olympus, V2.7.1 for ScanR high-content microscopy) 
Volocity acquisition software (Perkin Elmer, V6.3, for Ultraview Vox spinning disk microscopy), 
ZEN Black acquistion software (Zeiss, for ELYRA PS.1 SIM microscopy and LSM880 confocal microscopy) 
OMX SoftWoRx software (GE Healthcare, V6.1, for OMX V3 Blaze SIM microscopy) 

mSpector software (Abberior Instruments, for Abberior STED microscopy) 

PALM-Robo software (Zeiss, 4.5.09, for laser-microirradiation microscopy) 


Data analysis ScanR Analysis (Olympus, V 2.7.1, for QIBC) 

Fiji/Image J with SIMCheck plug-in (for SIM microscopy quality control) 

QUANTEX (in house custom software developed for this study, for 3D image analysis of sub-cellular structures) 
ATLAB (MathWorks R2018b) 

R (3.6.1, for coding of QUANTEX software and data plotting of ChaiN data) 

Excel (Microsoft, 2016, for image analysis data management) 

Spotfire (Tibco, V7.8.0.1.20, for data visualization and statistics) 

maris (Bitplane, V x64 9.0.2 for visualization of 3D-SIM images) 

Volocity (Perkin Elmer, V6.3 for image viewing). 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


Numerical and statistical source data for Figs. 1e,f, 2d,e, 3a,b,e,f,g,h,i and Extended Data Figs. 1d,e, 2c,e,f, 3b, 4d, 5b, 6c,d, 7a,b,d, 8c,d, 9c, 10b,d have been 
provided with this manuscript. Primary imaging data underlying widefield, confocal, SIM and STED images in Figs. 1a,b,c,d, 2a,b,c,f, 3a,b,c,d,e,f,g,h and Extended 
Data Figs. 1c,i,j,k, 2a,b,c, 4b,c,e,f,h,i, 5b, 6a,c,d, 7c,e,f, 8b,c,d,e,f,g, 9b,f,g,h,k,| has been deposited at the European Bioinformatics Institute (EBI) BioStudies database 
(https://www.ebi.ac.uk/biostudies/) with accession number S-BSST275. Processed imaging datasets underlying QIBC, QUANTEX, ChaiN and other analysis, including 
guidance on how to navigate datasets, are available from the corresponding author upon reasonable request. 

There are no restrictions on data availability. 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No statistical methods were used to predetermine sample size, as this study did not include animal models or human participants. Sample size 
was determined based on standards in the field and experimental experience to obtain statistical significance and reproducibility. 


Data exclusions — All data acquired for for this study were included in the analysis with rare exceptions where images did not pass quality controls: for 3D-SIM 
microscopy, images that showed more than 50% of bleaching during acquisition and did not pass the SIMcheck quality control step for 
artefact-free image reconstruction were discarded. In 3D-SIM live cell imaging, images were discarded if the cell moved out of focus or out of 
field of view during image acquistion. In Quantex-based image analysis, subcellular structures of interest (e.g. 53BP1 MDs) that were touching 
the boundary of the image frame, and objects, that could not be faithfully segmented for further image analysis (e.g. over-segmentation of 
closely apposed touching structures of high intensity or under-segmentation of low-intensity objects) were not further analysed. 


Replication All experimental findings were reliably reproduced in multiple independent experiments as indicated in the figure legends. All attempts of 
replication were successful. 


Randomization No randomization was done, because this study does not involve animals or human participants. Samples were organized into groups based 
on treatments (e.g. untreated or control siRNA treated compared to target-specific siRNAs; experimental time-points). Appropriate controls 
were included in all experiments. 


Blinding There was no blinded group allocation. All data that passed quality controls (see data exclusion) were analyzed by unbiased automated image 
analysis or under strict internal standards for objective manual image analysis. 
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Unique biological materials 


Policy information about availability of materials 


Obtaining unique materials Cell lines generated for this study (U2OS cells homozygously tagged with 53BP1-mEGFP; U20S cells stably expressing 
EGFP-53BP1 and Histone H2B-HaloTag, U2OS cells stably expressing GFP-53BP1-7A mutant) will be made available upon 
reasonable request. Other cell lines that were published previously by our laboratory and used in this study can be obtained 
upon reasonable request. 


Antibodies 


Antibodies used Antibody, Host, Supplier, Cat. number, Lot number, Technique, Conc 
53BP1, Mouse, Millipore, MAB3802, 3106536, IF, 1:750 
53BP1, Rabbit, Novus Biologicals, NB100-305, A4, IF 1:750, WB 1:1000 for WB 
53BP1, Rabbit, Novus Biologicals, NB100-304, F, WB 1:1000 
BRCA1, Mouse, Calbiochem, 092, D00168480, IF 1:100 
CtIP, Mouse, Active Motif, 61141; 03618004, clone 14-1, WB 1:250 
FLAG-Tag, Mouse, Sigma-Aldrich, F1804, SLBT 7654, IF 1:300 
GFP, Rabbit, Torrey Pines Biolabs, TP401, 081211, WB 1:1000 
H2AX phospho-S139, Mouse, Abcam, ab22551, GR226782-1, IF 1:1000 
H2AX phospho-S139, Rabbit, Cell Signaling, 9733, 8, IF 1:1000 
HaloTag, Mouse, Promega, G921A, 0000261985, WB 1:1000 
Histone H2B, Rabbit, Abcam, ab1790, GR310932, WB 1:2000 
AP1, Rabbit, Bethyl Laboratories, A300-274A, 3, WB 1:2000 
CM2, Mouse, Novus Biologicals, HO0004171-M01, 13211-6A8, IF 1:200, WB 1:1000 
CM5 Rabbit, Abcam, ab17967, GR249182-35, IF 1:200 
CM7 Mouse, Santa Cruz, sc-9966, E2015, WB 1:1000 for WB 
CMBP, Rabbit, Novus Biologicals, NBP1-90746, A115112, WB 1:1000 
UDC, Rabbit, Sigma-Aldrich, HPAO27183 , R12662, WB 1:1000 for WB 
AD21, Mouse, Millipore, 05-908, 3135877, clone 53A303, WB 1:500 
AP80, Rabbit, Bethyl Laboratories, A300-764A, 1, IF 1:400 
F1, Rabbit, Bethy! Laboratories, A300-569A, 5, IF 1:500 
F1 Mouse, Santa Cruz, sc-515573, C0216, IF 1:500 
F1, Rabbit, Cell Signaling, 95558, IF 1:500, 1, WB 1:1000 
PA7O, Rabbit, Abcam, ab79398, GR212113-23, IF 1:300 
C1, Rabbit, Novus Biologicals, NBP2-67733, HJO910, WB 1:1000 for WB 
ubulin, Mouse, Santa Cruz, SC-8035, C3012, WB 1:500 
RCC4, Rabbit, Abcam, ab213729, GR302718-2, IF 1: 100 
Goat anti-mouse Alexa Fluor 488, Goat, Invitrogen, A11029, 1942237, IF 1:1000 
Goat anti-rabbit Alexa Fluor 488, Goat, Invitrogen, A11034, 1937195, IF 1:1000 
Goat anti-mouse Alexa Fluor 568, Goat, Invitrogen, A11031, 2026148, IF 1:1000 
Goat anti-rabbit Alexa Fluor 568, Goat, Invitrogen, A11036, 1924788, IF 1:1000 
Goat anti-mouse Alexa Fluor 647, Goat, Invitrogen, A21236, 1793803, IF 1:1000 
Goat anti-rabbit Alexa Fluor 647, Goat, Invitrogen, A21245, 1805235, IF 1:1000 
Goat anti-mouse STAR RED , Goat, Abberior, 2-0002-011-2, 26062018Hp, IF 1:1000 
Goat anti-rabbit STAR RED , Goat, Abberior, 2-0012-011-9, 09072018CW/JR, IF 1:1000 
Goat anti-mouse STAR 580 , Goat, Abberior, 2-0002-005-1 , IF 1:1000 
Goat anti-rabbit STAR 580 , Goat, Abberior, 2-0012-005-8, 12102016HP, IF 1:1000 
Goat anti-rabbit IgG (H&L) Peroxidase labelled, Goat, Vector Laboratories, Pl-1000, ZE0614, WB 1:10000 
Horse anti-mouse IgG (H&L) Peroxidase labelled, Horse, Vector Laboratories, PIl-2000, ZC212, WB 1:10000 


= 
fev) 
= 
= 
= 
o 
= 
o 
Za) 
© 
fev) 
= 
a) 
Sr 
= 
O 
72) 
e) 
=a 
=} 
a 
Za) 
S 
3 
3 
fev) 
5 
S 


xXxAUNnNvDDWWDWAWAD 


| antibody validations were done with cell line samples derived from human origin. 
3BP1, Mouse, Millipore, MAB3802: validated by IF after 53BP1 siRNA in our laboratory 


Validation A 
5 
53BP1, Rabbit, Novus Biologicals, NB100-305, validated by IF after 53BP1 siRNA in our laboratory 
5 
B 
C 


3BP1, Rabbit, Novus Biologicals, NB100-304, validated by WB (Extended Data Fig. 4b) 

RCA1, Mouse, Calbiochem, 092, D00168480, IF 1:100, validated by IF after Brca1 siRNA in our laboratory 

tIP, Mouse, Active Motif, 61141, validated by WB after CtIP siRNA (Extended Data Fig. 9m) 

LAG-Tag, Mouse, Sigma-Aldrich, F1804, see manufacturer information (https://www.sigmaaldrich.com/catalog/product/sigma/ 


£1804 ?lang=en&region=DK) 

GFP, Rabbit, Torrey Pines Biolabs, TP401, validated by WB (Extended Data Fig. 1b) 

H2AX phospho-S139, Mouse, Abcam, ab22551, validated by WB (Extended Data Fig. 9i) and by IF in our laboratory 
H2AX phospho-S139, Rabbit, Cell Signaling, 9733, validated by IF in our laboratory 

HaloTag, Mouse, Promega, G921A, validated by WB (Extended Data Fig. 6b) 

Histone H2B, Rabbit, Abcam, ab1790, validated by WB (Extended Data Fig. 6b) 

AP1, Rabbit, Bethy! Laboratories, A300-274A, 3, validated by WB (Extended Data Fig. 2d) 

CM2, Mouse, Novus Biologicals, HO0004171-M0O1, validated by IF in our laboratory and WB (Extended Data Fig. 1g)) 
CM5 Rabbit, Abcam, ab17967, validated by IF in our laboratory 

CM7 Mouse, Santa Cruz, sc-9966, validated by WB (Extended Data Fig. 91) 

CMBP, Rabbit, Novus Biologicals, NBP1-90746, validated by WB (Extended Data Figs. Fig. 6b, 9e) 

UDC, Rabbit, Sigma-Aldrich, HPAO27183 , R12662, validated by Human Protein Atlas (https://www.sigmaaldrich.com/catalog/ 
product/sigma/f1804?lang=en&region=DK) 


RAD21, Mouse, Millipore, 05-908, 3135877, validated by WB (Extended Data Fig. 9d) 

RAP80, Rabbit, Bethyl Laboratories, A300-764A, see manufacturer information (https://www.bethyl.com/product/A300-764A? 
referrer=search) 

RIF1, Rabbit, Bethyl Laboratories, A300-569A, see manufacturer information (https://www.bethyl.com/product/A300-569A? 
referrer=search) 

F1, Rabbit, Cell Signaling, 95558, validated by WB (Extended Data Fig. 4a) 

PA7O, Rabbit, Abcam, ab79398, see manufacturer information (https://www.abcam.com/rpa70-antibody-epr3472- 
b79398.html) 

C1, Rabbit, Novus Biologicals, NBP2-67733, validated by WB (Extended Data Fig. 9e) 

ubulin, Mouse, Santa Cruz, SC-8035, see manufacturer information (https://www.scbt.com/scbt/product/alpha-tubulin- 
ntibody-tu-02 ?requestFrom=search) 

RCC4, Rabbit, Abcam, ab213729, validated by WB (Extended Data Fig. 2d) 
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Eukaryotic cell lines 


Policy information about cell lines 
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Cell line source(s) RPE1-hTERT cells and BJ fibroblasts were obtained from ATCC. Parental U20OS cell line was originally obtained from Ed 
Harlow’s lab in Boston (1990's) and maintained in the cell line repository of the Danish Cancer Society. U2OS cell lines 
expressing fluorescently tagged proteins (GFP-53BP1 WT or 53BP1-7A mutant, H2B-Halo/GFP-53BP1 or endogenously tagged 
53BP1-GFP) are derivates of the parental U20OS cell line. U2OS cell stably overexpressing 3xFLAG-RINN1/ShId3 were obtained 
from Chuna Choudhary. HeLa-Kyoto cells were obtained from Dr. S. Narumiya. HeLa cells stably expressing Histone H2B-GFP 
were obtained from Francis Barr. HCT116 cells with an endogenously integrated RAD21-mAID-mClover were obtained from 
Erez Lieberman Aiden. 


Authentication All cell lines have been authenticated by STR profiling. 
Mycoplasma contamination All cell lines are routinely (two month basis) tested for mycoplasma using PCR-based methods (LONZA) and always found 
negative. 


Commonly misidentified lines Cell lines used in this study were not listed in the 
(See ICLAC register) commonly misidentified category. 


Article 


Metabolic regulation of gene expression by 
histone lactylation 


https://doi.org/10.1038/s41586-019-1678-1 


Received: 21 June 2018 


Accepted: 13 September 2019 


Published online: 23 October 2019 


Di Zhang", Zhanyun Tang", He Huang’”, Guolin Zhou', Chang Cui’, Yejing Weng’, 
Wenchao Liu’, Sunjoo Kim, Sangkyu Lee®, Mathew Perez-Neut', Jun Ding’, Daniel Czyz*, 
Rong Hu°®, Zhen Ye**, Maomao He’, Y. George Zheng’, Howard A. Shuman“, Lunzhi Dai’”°, 


Bing Ren®*, Robert G. Roeder”, Lev Becker'®"* & Yingming Zhao'®* 


The Warburg effect, which originally described increased production of lactate in 
cancer, is associated with diverse cellular processes such as angiogenesis, hypoxia, 
polarization of macrophages and activation of T cells. This phenomenon is intimately 
linked to several diseases including neoplasia, sepsis and autoimmune diseases’”. 
Lactate, which is converted from pyruvate in tumour cells, is widely known as an energy 
source and metabolic by-product. However, its non-metabolic functions in physiology 
and disease remain unknown. Here we show that lactate-derived lactylation of histone 


lysine residues serves as an epigenetic modification that directly stimulates gene 
transcription from chromatin. We identify 28 lactylation sites on core histones in 
human and mouse cells. Hypoxia and bacterial challenges induce the production of 
lactate by glycolysis, and this acts as a precursor that stimulates histone lactylation. 
Using M1 macrophages that have been exposed to bacteria as a model system, we show 
that histone lactylation has different temporal dynamics from acetylation. In the late 
phase of M1 macrophage polarization, increased histone lactylation induces 
homeostatic genes that are involved in wound healing, including Arg/. Collectively, our 
results suggest that an endogenous ‘lactate clock’ in bacterially challenged M1 
macrophages turns on gene expression to promote homeostasis. Histone lactylation 
thus represents an opportunity to improve our understanding of the functions of 
lactate and its role in diverse pathophysiological conditions, including infection and 


cancer. 


Inspired by the discovery of various histone acylations derived from 
cellular metabolites**, we predicted and identified lysine lactylation 
(Kla) as a new type of histone mark that can be stimulated by lactate 
(Fig. 1a). Initial evidence for histone Kla came from the observation of 
amass shift of 72.021 Da on lysine residues in three proteolytic pep- 
tides that were detected in high-performance liquid chromatography 
(HPLC)-tandem mass spectrometry (MS/MS) analysis of tryptically 
digested core histones from human MCF-7 cells (Fig. 1b and Extended 
Data Fig. 1b, d). This mass shift is the same as that caused by the addition 
of alactyl group to the e-amino group of a lysine residue. 

To validate the existence of lysine lactylationin histones, we used four 
orthogonal methods’. In the first two methods, we used HPLC-MS/MS 
analysis to compare a synthetic peptide with its in vivo-derived coun- 
terpart to determine whether the two versions of the peptide have the 
same chemical properties in terms of chromatographic elution in HPLC 
analysis and fragmentation pattern in MS/MS analysis. To achieve this, 


we generated three histone peptides bearing Kla modifications: H3K23- 
QLATK,,AAR; H2BK5-PELAK,,SAPAPK; and H4K8-GGK,,GLGK. Each pair 
of peptides co-eluted in HPLC and had comparable MS/MS spectra 
(Fig. lb and Extended Data Fig. la—e). To confirm the modification fur- 
ther, we developed a pan anti-Kla antibody (Extended Data Fig. If, g). 
Immunoblots using the pan anti-Kla antibody confirmed the presence 
of histone Kla and showed that histone Kla levels were increased ina 
dose-dependent fashion in response to exogenous L-lactate (Extended 
Data Fig. Ih-j). Subsequent mass spectrometry analyses identified 26 
and 16 histone Kla sites from human HeLa cells and mouse bone marrow- 
derived macrophages (BMDMs), respectively (Fig. 1c). Finally, metabolic 
labelling experiments using isotopic sodium L-lactate (?C;) followed 
by MS/MS analysis demonstrated that lysine lactylation can be derived 
from lactate (Extended Data Fig. 1k). Together, these experiments dem- 
onstrate that histone Kla is an in vivo protein post-translational modi- 
fication derived from lactate. 
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Fig. 1| Identification and validation of histone Kla. a, Illustration of Kla 
structure. b, MS/MS spectra of alactylated histone peptide (H3K23la) derived 
from MCF-7 cells (in vivo), its synthetic counterpart, and their mixture. Thebion 


Histone Klais regulated by glycolysis 


Given that extracellular lactate can stimulate histone Kla, we hypothe- 
sized that modulation of intracellular lactate production wouldalso affect 
histone Kla levels. We exposed MCF-7 and other cell lines to various 
concentrations of glucose, the major source of intracellular lactate. 
Both lactate production and histone Kla levels were induced by glucose 
in a dose-dependent manner (Fig. 2a, b and Extended Data Fig. 2a-c). 
Conversely, the non-metabolizable glucose analogue 2-deoxy-D-glucose 
(2-DG) decreased bothlactate productionandhistoneKlalevels (Fig.2c,d). 
Furthermore, metabolic labelling experiments using isotopic glucose 
(U-8C,) followed by MS/MS analysis demonstrated that lysine lactylation 
is endogenously derived from glucose (Extended Data Fig. 2d and Sup- 
plementary Table 1). Quantitative proteomics analysis across a diverse 
set of histone sites demonstrated that histone Kla and Kac have different 
kinetics of U-°C,-glucose incorporation in MCF-7 cells (Extended Data 
Fig. 2e, f).°C-labelled histone Kac reached a steady state at 6 h—similar 
to the previous observation in HCT116 cells®. By contrast, histone Kla 
increased over a 24-h time course (Extended Data Fig. 2e, f). Immunob- 
lotting results corroborated the MS/MS data in cell lines such as MCF-7 
cells (Extended Data Fig. 2i-k). 

Lactate productionis determined by the balance between glycolysis 
and mitochondrial metabolism. We tested whether the activities of 
enzymes in these two pathways can modulate lactate levels that in turn 
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refers to the N-terminal parts of the peptide, and they ion refers tothe 
C-terminal parts of the peptide. Data represent two independent experiments. 
c, Illustration of histone Kla sites identified in human and mouse cells. 


regulate histone Kla (Fig. 2e). Sodium dichloroacetate (DCA) and oxam- 
ate were used to inhibit lactate production by modulating the activities 
of pyruvate dehydrogenase and lactate dehydrogenase, respectively. As 
anticipated, intracellular levels of lactate were decreased by these two 
compounds (Fig. 2f) and levels of histone Kla were lowered (Fig. 2g, h). 
By contrast, rotenone—an inhibitor of the mitochondrial respiratory 
chain complex I that drives cells towards glycolysis—increased levels of 
both intracellular lactate and histone Kla (Fig. 2f, i). Quantification of 
histoneKlaandKacmarksbystableisotopelabellingwithaminoacidsincell 
culture (SILAC) and MS/MS analyses corroborated the immunoblot data 
from DCA- and rotenone-treated MCF-7 cells (Extended Data Fig. 21, m). 
Furthermore, labelling experiments with U-°C,-glucose showed that 
the incorporation of ¥C into histone Kla but not Kac was decreased by 
DCA (Extended Data Fig. 2e-h). Together, these observations demon- 
strate that endogenous production of lactate is a key determinant of 
histone Kla levels. 


Hypoxia and bacterial exposure stimulate histone Kla 

Increased glycolysis and lactate production are coupled with diverse 
cellular processes. To investigate whether histone Kla is regulated by 
glycolysis under physiological conditions, we chose two model systems: 
hypoxia and M1 macrophage polarization. In response to hypoxia, cells 
reprogram their metabolism by inhibiting oxidative phosphorylation 
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Fig. 2| Lactate regulates histone Kla. a—d, Intracellular lactate levels (a, d) and 
histone Kla levels (b, c) were measured from MCF-7 cells cultured in different 
concentrations of glucose or 2-DG in the presence of 25 mM glucose for 24h. 
Lactate was measured by a lactate colorimetric kit. n =3 biological replicates; 
statistical significance was determined using one-way ANOVA followed by 
Sidak’s multiple comparisons test. Immunoblots were performed using acid- 
extracted histone samples. The pan anti-Kla and anti-Kac immunoblots indicate 
molecular masses between 10 and 15 kDa. e, Regulation of glycolysis and lactate 
production by diverse metabolic modulators. f, Intracellular lactate levels were 
measured in MCF-7 cells treated with indicated glycolysis modulators for 24h. 


and enhancing glycolysis, which stimulates the production of lactate’. 
Hypoxia induced intracellular production of lactate and increased his- 
tone Kla levels but not Kac levels in MCF-7 cells (Fig. 2j, k and Extended 
Data Fig. 3a—d). SILAC-based mass spectrometric quantification of his- 
tone Kla and Kac confirmed the immunoblotting data (Extended Data 
Fig. 3e, f). Similar results were obtained in HeLa and RAW 264.7 cells 
(Extended Data Fig. 3g, h). Furthermore, we found that the induction 
of lactate production and histone Kla by hypoxia was attenuated by a 
lactate dehydrogenase inhibitor (oxamate) or a PDK1 inhibitor (DCA) 
(Extended Data Fig. 3i, j). Deletion of both LDHA and LDHB fully sup- 
pressed production of lactate and histone Kla in HepG2 cells under nor- 
moxic conditions (Extended Data Fig. 3k, |). Owing to poor cell viability, 
hypoxic conditions could not be tested (data not shown). 

Emerging evidence shows that lactate has regulatory functions in 
bothinnate and adaptive immune cells* and induces marked changes in 
gene expression’, suggesting that lactate is not simply a ‘waste product’ 
of glycolysis. Pro-inflammatory M1 macrophages undergo metabolic 
reprogramming towards aerobic glycolysis, resulting in lactate produc- 
tion, whereas anti-inflammatory M2 macrophages trigger a metabolic 
program of increased oxidative phosphorylation and fatty acid oxida- 
tion”. Our discovery of histone Kla marks and their dynamics therefore 
suggests a role in regulating gene expression during M1 macrophage 
polarization. 

Totest this hypothesis, we examined the dynamics of lactate produc- 
tion and histone Kla marks during M1 macrophage polarization after 
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n=3 biological replicates; statistical significance was determined using one-way 
ANOVA followed by Dunnett’s multiple comparisons test. g-i, Immunoblots of 
acid-extracted histones (rotenone and DCA) or whole-cell lysates (oxamate) 
from MCF-7 cells in response to different glycolysis modulators. j, Intracellular 
lactate levels were measured in MCF-7 cells in response to hypoxia. n=4 
biological replicates; statistical significance was determined using unpaired 
two-tailed t-test. k, Immunoblots of acid-extracted histones from MCF-7 cells 
under hypoxia (1% oxygen) for indicated time points. Dataina, d,fandjare mean 
ands.e.m. Datainb, c, g-iand k represent three independent experiments. 


treatment of BMDMs with lipopolysaccharide (LPS) and interferon-y 
(IFNy). We observed increased intracellular lactate levels 16-24 h after 
M1 activation (Fig. 3a), which correlated with increased histone Kla 
levels (Fig. 3b, c). By contrast, histone Kac levels were decreased at 
these time points (Fig. 3b, c). This differential pattern was confirmed 
by U-?C,-glucose labelling experiments, in which ¥C-labelled histone Kac 
peaked 3 h after labelling and declined toa steady state, whereas histone 
Kla increased over the 24-h time course (Extended Data Fig. 4a—d). In 
addition, the LDHA-specific inhibitor GNE-140 reduced “C incorporation 
into histone Kla, but not Kac (Extended Data Fig. 4e, f). The increase of 
histone Kla during M1 polarization is intrinsic and not due to paracrine 
effects, because replenishing cells with fresh media every 4 h did not 
affect Kla levels (Extended Data Fig. 4g). Increases in lactate produc- 
tion and histone Kla are also specific to M1 macrophages because they 
were not observed in M2-polarized BMDMs (Fig. 3d and Extended Data 
Fig. 4h), which are more reliant on fatty acid oxidation”. 


Histone Klainduces M2-like genes in M1 macrophages 


Histone modifications have an important role in the regulation of gene 
expression". To investigate histone Kla-associated genes 24 h after M1 
polarization of macrophages, we performed RNA sequencing (RNA-seq) 
and paired chromatin immunoprecipitation followed by sequencing 
(ChIP-seq) using anti-H3K18la or anti-H3K18ac antibodies (the specifi- 
cities of which were validated by dot blots) (Extended Data Fig. 3a—d), 
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Fig. 3 | Increased histone Kla during M1 macrophage polarizationis 
associated with M2-like gene activation. a~c, BMDMs were activated with LPS 
and IFNy. a, Intracellular lactate levels were measured using alactate 
colorimetric kit. n=3 biological replicates; statistical significance was 
determined using one-way ANOVA followed by Dunnett’s multiple comparisons 
test. b,c, Histone acylations were analysed by immunoblots using whole-cell 
lysates. Image] was used for quantification; n =3 technical replicates. Data 
represent two independent experiments. d, BMDM cells were stimulated with 
PBS (MO), LPS and IFNy (M1), and IL-4 (M2) for 24h, respectively. Acid-extracted 
histones were used for immunoblots. e, f, Scatter plot (e) and bar plot (f) 
showing genes with promoters marked by exclusively increased H3K18la 
(H3K18la-log,[M1/MO] 2 land H3K18ac-log,[M1/MO] < 0.5, H3K18la-specific); 
increased in both H3K18la and H3K18ac (H3K18la-log,[M1/MO] > 1and H3K18ac- 
log,[M1/MO] = 0.5, shared); or exclusively increased H3K18ac (H3K18ac-log, 


ChIP and quantitative PCR (qPCR) assays (Extended Data Fig. 4i, j) and 
immunoblots (Extended Data Fig. 4k). 

Our ChIP-seq data showed that H3K18la and H3K18ac were both 
enriched in promoter regions (+2 kb around transcriptional start 
sites) (Extended Data Fig. 41) and were indicative of steady-state mRNA 
levels (Extended Data Fig. 4m, n). In addition, increased H3K18la (twofold 
increase) marked more genes than decreased H3K18la (twofold 
decrease), whereas the converse was true for the H3K18ac modifica- 
tion (Fig. 3e). Moreover, most genes marked by increased H3K18la were 
specific, because 68% of these genes (1,223 out of 1,787) did not display 
significantly increased H3K18ac (Fig. 3e, fand Supplementary Tables 2, 3). 
By contrast, no H3K18ac-specific genes were identified (Fig. 3e, f). Rep- 
resentative tracks from ChIP-seq studies are shown in Extended Data 
Fig. 40, p. 

To study correlations between H3K18la marks and gene expression, 
we performed RNA-seq analysis 0, 4, 8, 16 and 24 hafter challenge with 
LPS and IFNy (Extended Data Fig. 5a and Supplementary Table 4). As 
expected, inflammatory response genes (for example, Nos2) were 
induced as early as 4 h after challenge with LPS and IFNy, and their expres- 
sion levels steadily declined at later time points (Fig. 3g). Notably, the 
1,223 genes specifically marked by increased H3K18la were more likely 
to be activated or reactivated at later time points (16 or 24h) during M1 
polarization (Fig. 3h and Extended Data Fig. 5a—c), which correlated well 
with the induction of intracellular lactate and histone Kla levels at these 
later time points (Fig. 3a—c). Gene Ontology (GO) analysis revealed that 
these H3K18la-specific genes were enriched in biological pathways that 
are independent of inflammation (Extended Data Fig. 5d). One of these 
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[M1/MO] = 1and H3K18la-log,[M1/MO] < 0.5, H3K18ac-specific). g, h, Heat maps 
showing gene expression kinetics (using reads per kilobase of transcript per 
million mapped reads (RPKM) values from RNA-seq) of exemplar inflammatory 
genes (g) and H3K18la-specific genes (h). The colour key represents log,- 
transformed fold change relative to gene expression at 0h. n=4 biological 
replicates. i,j, BMDM cells were infected with indicated Gram-negative bacteria 
or LPS, respectively. i, Histone Kla levels were measured by immunoblot at 24h 
after bacterial challenge. ‘+’ indicates lower dose, and ‘++’ indicates higher dose. 
j, Gene expression was analysed by quantitative PCR with reverse transcription 
(RT-qPCR) at indicated time points after bacterial challenge. n =3 biological 
replicates. k, Protein levels of inducible nitric oxide synthase (INOS) and ARG1 
were analysed by immunoblots from BMDMs activated by the indicated stimuli. 
Dataina-c,jare meanands.e.m. Dataind,iandk represent three independent 
experiments. 


enriched pathways was wound healing (for example, Arg1), which has 
been associated with the M2-like phenotype (Fig. 3h and Extended Data 
Fig. 5d). To corroborate these findings with more physiologically rel- 
evant stimuli, we treated BMDMs (MO) with live or dead Gram-negative 
bacteria (Escherichia coli, Acinetobacter baumannii and Pseudomonas 
aeruginosa) to stimulate M1 polarization. Similar to treatment with 
LPS, bacteria induced lactate production and global histone Kla but not 
histone Kac levels (Fig. 3iand Extended Data Fig. Se, f), and the kinetics 
of early cytokine and late Arg/ expression were maintained (Fig. 3j and 
Extended Data Fig. 5g-j). 

Arginine metabolism is a key catabolic and anabolic process that 
is regulated during macrophage polarization. M1 macrophages are 
thought to have low levels of ARG1 and metabolize arginine via nitric 
oxide synthase to produce nitric oxide to kill pathogens, whereas M2 
macrophages have high levels of ARG1, which produces ornithine to 
facilitate wound healing”. Consistent with their RNA dynamics, ARG1 
protein levels and activity were markedly increased 24-48 h after M1 
polarization, whereas NOS2 protein levels and function peaked 12 hafter 
M1 polarization and declined at later time points (Fig. 3k and Extended 
Data Fig. 5k). Collectively, these findings suggest that induction of lac- 
tate during M1 activation might promote a late-phase switch toa more 
homeostatic phenotype, which shares some similarity with the M2-like 
phenotype. Indeed, previous studies showed that treating BMDMs with 
lactate derived from tumour cells drives an M2-like phenotype that 
is characteristic of tumour-associated macrophages (TAMs)”. Using 
mouse cancer models, we observed a positive correlation between Arg1 
expression and histone Kla levels, but not histone Kac levels in TAMs 
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Fig. 4 | Lactate activates M2-like gene expression through histone Kla. 

a-d, Decreased lactate production in LDHA-deficient (myeloid-specific Ldha”; 
mLdha~) BMDM cells resulted in lowered histone Kla levels and Arg] expression 
during M1 polarization. f1/f1, littermate control mice. Intracellular lactate levels 
were measured using a lactate colorimetric kit (a) and global histone Kla levels 
were measured by immunoblots (b) 24 hafter M1 polarization. c, Gene 
expression was analysed by RT-qPCRat indicated time points after M1 
polarization. a-c, n=3 biological replicates. d, H3K18la occupancy was analysed 
by ChIP-qPCR 24h after M1 polarization. Data represent three technical 
replicates from pooled samples. e-h, Exogeneous lactic acid (LA) (25 mM) was 


isolated from B16F10 melanoma and LLC1 lung tumours (Extended 
Data Fig. 6a-e). 

Changes in gene expression during M1 polarization are caused by com- 
plex signalling cascades induced by LPS and IFNy, including the induc- 
tion of lactate and histone Kla. To substantiate the role of lactate and 
histone Klain the regulation of gene expression, we manipulated levels 
of lactate during M1 polarization and examined its effect on the expres- 
sion of Argi, an M2-like gene. We first lowered lactate levels by deleting 
Ldha (LysM-Cre*’ Ldha™'; Extended Data Fig. 7a-c). Lactate production 
and global histone Kla levels were both decreased in LDHA-deficient 
macrophages during M1 polarization (Fig. 4a, b). Although deleting 
Ldhainmacrophages did not alter the expression of pro-inflammatory 
cytokines (Extended Data Fig. 7d-g), it attenuated Arg] and decreased 
histone Kla marks at the Arg] promoter (Fig. 4c, d). Similar findings were 
obtained when macrophages were MI polarized in the presence of gly- 
colysis inhibitors (2-DG, DCA and GNE-140) (Extended Data Fig. 7h-m). 
Next, we increased lactate levels by treating M1 macrophages with exog- 
enous lactate. Exogenous lactate increased intracellular lactate (Fig. 4e) 
and histone Kla levels (Fig. 4f), and induced Argi expression (Fig. 4g) 
and Kla levels at the Arg] promoter (Fig. 4h). By contrast, exogenous 
lactate did not affect the expression of early pro-inflammatory genes 
(Extended Data Fig. 8a-d). In addition, exogenous lactate enhanced the 
expression of other M2-like genes, such as Vegfa during M1 polarization 
(Extended Data Fig. 8e-—h and Supplementary Table 5). Thus, these data 
confirmed the positive role of lactate and histone Klain driving expres- 
sion of M2-like genes during M1 macrophage polarization. 


Histone Kla directly stimulates gene transcription 


Our observed correlations between lactate, H3K18la and M2-like gene 
expression does not necessarily indicate that the histone Kla mark was 
a causative factor. Previous studies showed that exogenous lactate can 
alter Argi and Vegfa expression in unstimulated (MO) macrophages by 


added to BMDM cells 4h after M1 polarization (LPS + IFNy), and cells were 
collected at indicated time points after M1 polarization for intracellular lactate 
measurement (e), histone Klaimmunoblot analysis (f), gene expression analysis 
(g) and H3K18la occupancy analysis by ChIP-qPCR (h). e, n =3 biological 
replicates. f, Data represent three independent experiments. g, n=4 biological 
replicates. h, Data represent three technical replicates from pooled samples. 
Dataina, c-e, gand hare meanands.e.m.; statistical significance was 
determined using multiple t-tests corrected using the Holm-Sidak method 
(a,c,e, g). 


HIF1a®. However, HIF1a is unlikely to be important for regulating Arg7 
and Vegfa during M1 polarization as HIF1a protein was induced at early 
time points and bound to promoters of glycolytic genes but not Arg1 
and Vegfa (Extended Data Fig. 8i-m). 

To examine whether histone Kla has a direct role in transcriptional 
regulation, we took advantage of a cell-free, recombinant chromatin- 
templated histone modification and transcription assay (Extended Data 
Fig. 9a) that was used previously to demonstrate direct transcriptional 
activation by p53- and p300-dependent histone Kac". This assay, in 
which acetyl-CoA was replaced by L-lactyl-CoA (validated by HPLC and 
mass spectrometry; Extended Data Fig. 9h-k), demonstrated robust 
p53-dependent, p300-mediated H3 and H4 lactylation (Extended Data 
Fig. 9b) and a corresponding effect on transcription (Extended Data 
Fig. 9c). The effects paralleled those observed for acetyl-CoA-dependent 
histone acetylation and transcription. To confirm that transcription was 
directly mediated by lactylation of histones, rather than other proteins in 
the nuclear extract, recombinant chromatin was reconstituted with core 
histones bearing lysine-to-arginine mutations in histone tails”. Com- 
pared with wild-type histones, the H3 and H4 mutations, but not the H2A 
or H2B mutations, eliminated p300- and p53-dependent transcription 
(Extended Data Fig. 9d). Together, these findings suggest that, similar 
to histone acetylation, histone lactylation can directly promote gene 
transcription under the described conditions. To examine the poten- 
tial activity of p300 as a histone Kla writer in cells, we overexpressed 
p300 in HEK293T cells and observed a modest increase in histone Kla 
levels (Extended Data Fig. 9e). By contrast, p300 deletion in HCT116 and 
HEK293T cells decreased histone Kla levels (Extended Data Fig. 9f, g). 
Although we cannot exclude an indirect effect by p300 in these cells, 
together with the in vitro enzymatic results, these data suggest that 
p300 is a potential histone Kla writer protein. 

In response to bacterial infection, macrophages must react rapidly 
with a substantial pro-inflammatory burst to help kill bacteria and 
recruit additional immune cells to the infection site. During this process, 
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macrophages switch to aerobic glycolysis”, which is thought to support 
pro-inflammatory cytokine expression during M1 activation” and pro- 
duce the Warburg effect. Over time, this metabolic switch also increases 
intracellular lactate, which we show stimulates histone lysine lactylation 
16-24 hafter exposure to M1-polarizing stimuli. Histone lactylation is not 
required for the induction or suppression of pro-inflammatory genes. 
Instead, it serves as a mechanism to initiate expression of homeostatic 
genes that have been traditionally associated with M2-like macrophages. 
Our studies support a model in which the switch to aerobic glycolysis 
that occurs during M1 polarization starts a ‘lactate timer’ that uses an 
epigenetic mechanism to induce M2-like characteristics in the late phase, 
perhaps to assist with repairing collateral damage incurred by the host 
during infection. 

High levels of lactate (for example, 40 mM in certain type of tumour 
tissue”) is also associated with major hallmarks of diseases such as can- 
cer. Given that the Kla modification can be stimulated by lactate and 
contribute to gene expression, the Kla modification is likely to fill an 
important knowledge gap in our understanding of diverse physiopa- 
thology (for example, infection, cancer) with which lactate is intimately 
associated. 
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Methods 


Materials 

Pan anti-Kac (PTM-101), pan anti-Kla (PTM-1401), anti-H3K18la (PTM- 
1406), anti-H4K5la (PTM-1407) and anti-H4K8la (PTM-1405) antibodies 
were generated by PTM Bio Inc.; anti-histone H3 (ab12079), anti-H3K18ac 
(ab1191) and anti-H3K27ac (ab4729) antibodies were purchased from 
Abcam; Drosophila spike-in antibody (61686) and spike-in chromatin 
(53083) were obtained from Active Motif; anti-LDHA (2012S) anti- 
body was from Cell Signaling Technology; anti-a-tubulin (05-829) 
and anti-LDHB (ABC927) antibodies were from Millipore Sigma; anti- 
HIF-1a (NB100-105) antibody was from Novus Biologicals; anti-iNOS 
(GTX130246) and anti-ARGI (GTX109242) antibodies were purchased 
from GeneTex; anti-p300 (sc-584) was from Santa Cruz Biotechnology; 
anti-CD11b monoclonal antibody (M1/70), PE-cyanine7 (25-0112-82) 
and anti-F4/80 monoclonal antibody (BM8), APC (17-4801-82) were 
from Thermo Fisher Scientific; lipopolysaccharides from Escherichia 
coliO111:B4 (L4391), sodium L-lactate (71718), L-(+)-lactic acid (L6402), 
sodium dichloroacetate (347795), cobalt (II) chloride hexahydrate 
(C8661), rotenone (R8875), and acetyl-CoA (A2056) were purchased 
from Sigma-Aldrich; sodium L-lactate (13C3, 98%) (CLM-1579-PK) and 
D-glucose (U-13C6, 99%) (CLM-1396-1) were purchased from Cambridge 
Isotope Laboratories. Recombinant mouse IFNy protein (485-MI-100) 
was from R&D Systems; mouse IL-4 (130-097-760) was from Miltenyi 
Biotec; modified sequencing-grade trypsin was from Promega; lactate 
colorimetric assay kit II (K627-100), arginase activity colorimetric assay 
kit (K755-100), and nitric oxide synthase (NOS) activity assay kit (K205- 
100) were purchased from Biovision. 


Cell culture 

MCF-7, MDA-MB-231, HeLa, A549, HepG2, MEF and RAW 264.7 cells were 
obtained from the American Type Culture Collection and cultured in 
DMEM supplemented with 10% FBS and 1% GlutaMAX (GIBCO). Cells 
were routinely tested for mycoplasma contamination (MP00335, Sigma- 
Aldrich), and only negative cells were used in experiments. No specific 
cell line authentication was performed. For growth under hypoxic condi- 
tions, cells were grown ina specialized, humidified chamber equilibrated 
with1% oxygen, 94% nitrogen, 5% carbon dioxide for the indicated time. 


Mouse experiments 

All animal use and experiments performed were approved by Institu- 
tional Animal Care and Use Committee (ACUP 72209) at the University 
of Chicago. Ldha™ mice (Jackson Laboratory, 030112) and LysM-Cre 
mice (Jackson Laboratory, 004781) were used to generate LysM-Cre”” 
Ldhe“ and littermate control LysM-Cre’Ldha™ mice. The following 
primers were used for genotyping: Ldha forward: CTGAGCACACCCATG 
TGAGA and Ldha reverse: AGCAACACTCCAAGTCAGGA. LysM-cre (LysM 
is also known as Lyz2): CCCAGAAATGCCAGATTACGG, LysM common: 
CTTGGGCTGCCAGAATTTCTC and LysM WT: TTACAGTCGGCCAGGC 
TGAC. Macrophages were derived from bone marrow of 8-week-old male 
C57BL/6 mice following the published procedure"®. To induce an M1 or 
M2 phenotype, BMDM cells were stimulated with 5 ng mI“ of LPS and 
12ng mI’ of IFNy, or 20 ng mI of IL-4, for 24 h or the indicated time. To 
infect BMDM cells with bacteria, overnight cultures of £. coli, A. bauman- 
niior P. aeruginosa were diluted in RPMI-1640 and added to BMDM cells 
in 6-well plates at 2 and 20 multiplicity of infection. A control plate was 
either infected with paraformaldehyde-killed bacteria or treated with 
5ng mI™LPS in the absence of bacteria. The plates were centrifuged at 
975g for 30 min to promote infection, followed by a30 min incubation 
ina humidified incubator at 37 °C at 5% CO,. To kill extracellular bacteria, 
the medium overlying the confluent cell monolayer was replaced with 
fresh medium containing gentamicin at 100 pg mI‘and the plates were 
further incubated for 1h. After incubation, media were removed from 
infected cells and replaced with fresh media containing 25 pg mI of 
gentamicin. For consistency, LPS-treated cells and cells infected with 


dead bacteria were also treated with gentamicin. Cells were cultured for 
24h before lysis. Allocation of BMDM cells into different treated groups 
was randomized and not blinded. 


Tumour inoculation and TAM isolation 

LLC1 cells (0.5 x 10°) or B16F10 cells (1 x 10°) were injected into 7-week-old 
C57BL/6 mice (Jackson Laboratory). Once tumours reached approxi- 
mately 600 mm, mice were killed for tumour isolation. Tumours were 
digested with type 4 collagenase (Worthington, 3 mg mI‘) and hyalu- 
ronidases (Sigma, 1.5 mg mI) in 1% BSA/PBS at 37 °C with shaking at 
200 r.p.m. for 30 min. The digested tumour was then filtered through 
a 70-um cell strainer, followed by red blood cell lysis step and passing 
through another 40-um strainer. Cells were resuspended intoisolation 
buffer (0.1% BSA/PBS, 2mM EDTA), layered onto Ficoll-Paque PLUS (GE 
Healthcare), and centrifuged at 450g for 30 min without a break. Mono- 
nuclear immune cells were obtained by taking out the middle white layer. 
TAMs were then isolated using CD11b Microbeads (Mitenyi Biotec) as 
the company instructed. The purity of TAMs purity was confirmed by 
flow cytometry using CD11b and F4/80 antibody. Data were quantified 
by FlowJo v.10.4.1. 


Peptide immunoprecipitation 

Histones from human MCF-7 or mouse BMDM cells were extracted using 
astandard acid-extraction protocol”, and subjected to trypsin digestion 
as per the manufacturer’s instructions. Pan anti-Kla or pan anti-Kac anti- 
bodies were first conjugated to Protein A Sepharose beads (GE Health- 
care BioSciences) and then incubated with tryptically digested histone 
peptides with gentle agitation overnight at 4 °C. The beads were then 
washed three times with NETN buffer (50 mM Tris-Cl pH 8.0, 100 mM 
NaCl, 1mM EDTA, 0.5% NP-40), twice with ETN buffer (50 mM Tris-Cl 
pH 8.0, 100 mM NaCl, 1 mM EDTA) and once with water. Peptides were 
eluted from the beads with 0.1% TFA and dried in a SpeedVac system 
(Thermo Fisher Scientific). 


HPLC-MS/MS analysis 

The peptide samples were loaded onto ahomemade capillary column 
(10 cm length x 75 mm ID, 3 pm particle size, Dr. Maisch GmbH) con- 
nected to an EASY-nLC 1000 system (Thermo Fisher Scientific). Peptides 
were separated and eluted with a gradient of 2% to 90% HPLC buffer B 
(0.1% formic acid in acetonitrile, v/v) in buffer A (0.1% formic acid in 
water, v/v) at a flow rate of 200 nl min? over 60 min (34 min for coelu- 
tion studies). The eluted peptides were then ionized and analysed by 
a Q-Exactive mass spectrometer (Thermo Fisher Scientific). Full mass 
spectrometry was acquired in the Orbitrap mass analyser over the range 
m/z 300 to 1,400 with a resolution of 70,000 at m/z 200. The 12 most 
intense ions with charge >2 were fragmented with normalized collision 
energy of 27 and tandem mass spectra were acquired with a mass resolu- 
tion of 17,500 at m/z200. 


Isotopic-labelling experiments 

MCF-7 cells were cultured in DMEM high-glucose media plus 10% FBS. 
To be labelled by isotopic lactate, cells were treated with 10 mM of °C, 
sodium L-lactate for 24 h. To be labelled by isotopic glucose, cells were 
switched to DMEM No-Glucose media (Gibco) for 24 h, followed by sup- 
plementation with 25 mM of U-¥C, D-glucose and continued culturing 
for three passages. Histones were extracted, digested with trypsin, 
immunoprecipitated using a pan anti-Kla antibody and analysed by 
HPLC-MS/MS as described above. 


SILAC-based quantification 

MCF-7 cells were cultured in either ‘heavy’ (L-Lys-°C,, N,) or ‘light’ 
(L-Lys-”C,, *N,) DMEM, supplemented with 10% dialysed FBS (Serum 
Source International Inc.), for more than six passages, to achieve more 
than 99% labelling efficiency. Heavy-labelled and light-labelled cells 
were mixed ina 1:1 ratio. Histones were extracted, digested with trypsin, 
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immunoprecipitated using a pan anti-Kla antibody, and analysed by 
HPLC-MS/MS as described above. Quantification was analysed by 
Maxquant”’. Ratio H/L derived from Maxquant was then normalized 
by protein abundance. 


Synthesis of L-lactyl-CoA 

L-Lactic acid (90 mg, 1 mmol) was dissolved in 5 ml of freshly distilled 
CH,CI,. N-hydroxysuccinimide (115 mg, 1 mmol) was added to this solu- 
tion, and the reaction mixture was sonicated to obtaina clear solution. 
Then, N,N’-dicyclohexylcarbodiimide (DCC, 227 mg, 1.1 mmol) was 
added. A white precipitate formed after addition. The reaction mixture 
was stirred at room temperature overnight. Then the white precipitate 
was filtered and washed with CH,CN. The resulting organic solvent was 
evaporated by vacuum to afford crude product L-lactyl-NHS (170 mg, 
91% yield), which was used in the next step without further purifica- 
tion. CoA hydrate (0.0065 mmol; 5 mg) was dissolved in1.5 ml of 0.5 M 
NaHCO, (pH 8.0) and cooled down on ice bath. Then, L-lactyl-NHS 
(2.5 mg, 0.013 mmol) in 0.5 ml of CH,CN/acetone (1:1 v/v) was added 
dropwise to the CoA solution. The reaction solution was stirred at 4 °C 
overnight and then quenched by adjusting pH to 4.0 with 1.0 M HCI. 
The reaction mixture was then subjected to RP-HPLC purification with 
gradient 5-45% buffer A in buffer B over 30 min at flow rate 5 ml min?; 
UV detection wavelength was fixed at 214 and 254 nm (HPLC buffer 
A: 0.05% TFA in water; HPLC buffer B: 0.05% TFA in acetonitrile). The 
fractions were collected and lyophilized after flash-freeze with liquid 
nitrogen. m=2 mg, yield 38% 'H NMR (400 MHz, Deuterium oxide) 
6 8.57 (s, 1H), 8.33 (s, 1H), 6.12 (d,J=5.7 Hz, 1H), 4.49 (s, 1H), 4.29 - 4.24 
(m, 1H), 4.14 (s, 2H), 3.93 (s, 1H), 3.75 (d,/ = 8.6 Hz, 1H), 3.48 (d,/= 7.6 Hz, 
1H), 3.35 (t,/ = 6.4 Hz, 2H), 3.22 (d,J=5.2 Hz, 3H), 2.89 (q,/= 6.2 Hz, 2H), 
2.32 (t,J = 6.4 Hz, 2H), 1.23 (d,J = 6.9 Hz, 3H), 0.83 (s, 3H), 0.70 (s, 3H). 
MALDI m/z calculated for C,,H4;N7O,sP3S* [M + H]*: 840.1, found 839.6. 


In vitro chromatin template-based histone modification and 
transcription assays 

Purification of recombinant proteins and chromatin assembly were 
performedas previously described’. The chromatin-templated histone 
modification and transcription assays were as described previously”, 
except that lactyl-CoA was used in place of acetyl-CoA and [a-**P]CTP 
was used in place of [a-”P]UTP. The H3KR, H4KR, H2AKR and H2BKR 
histone mutants were the sameas previously described’. Histone modi- 
fications were monitored by immunoblot and transcription products 
were monitored by autoradiography as described®. 


RNA-seq 

Total RNA was extracted from BMDM cells activated as indicated using 
a RNeasy Plus Mini Kit (74134, Qiagen). Two to four micrograms of 
total RNA were used as starting material to prepare libraries using 
Illumina TruSeq Stranded mRNA Library Prep Kit Set A (RS-122-2101, 
Illumina). The size of the libraries was selected by using the Agencourt 
AMPure XP beads (A63882, Beckman Coulter), with average size of 
400 bp. The libraries were sequenced using IIlumina HiSeq 4000 
(pair end 50 bp). 

Bioinformatic analysis of RNA-seq data: sequencing quality was evalu- 
ated by FastQC v.0.11.4. All reads were mapped to the reference genome 
of Illumina iGenomes UCSC mm10 using HISAT2 v.2.1.0”. Differential 
expression analysis was implemented using edgeR v.3.16.5”, after retain- 
ing only genes for which counts per million (cpm) was larger than one 
in four samples and normalizing the library sizes across samples using 
the TMM method of the edgeR package. Hierarchical clustering was 
performed and heat maps were generated using Perseus v.1.6.1.1 (http:// 
www.coxdocs.org/doku.php?id=perseus:start). The log,-transformed 
gene expression values (RPKM) were normalized by subtracting the 
mean inevery row, and hierarchically clustered with a Pearson correla- 
tion algorithm. Gene Ontology analysis (GOTERM_BP_DIRECT) was 
carried out using DAVID bioinformatics resources 6.87. 


The following primers were used for RT-qPCR analysis: Arg]: CTCC 
AAGCCAAAGTCCTTAGAG, AGGAGCTGTCAT TAGGGACATC; Vegfa: 
CCACGACAGAAGGAGAGCAGAAGTCC, CGTTACAGCAGCCTGCACAG 
CG; 16: GITCTCTGGGAAATCGTGGA, TTTCTGCAAGTGCATCATCG; IIb: 
TTTGACAGTGATGAGAATGACC, CTCTTGTTGATGTGCTGCTG; /Ifnbi: 
CAGCTCCAAGAAAGGACGAAC, GGCAGTGTAACTCTTCTGCAT; Cxcl10: 
CCAAGTGCTGCCGTCATTTTC, GGCTCGCAGGGATGATTTCAA; Tnfa: 
CCCTCACACTCAGATCATCTTCT, GCTACGACGTGGGCTACAG; and Rn 18s 
(18S rRNA): GIAACCCGTTGAACCCCATT, CCATCCAATCGGTAGTAGCG. 


ChIP-seq 

Native ChIP was carried out following the published protocol” with 
spike-in for normalization purpose. Spike-in was carried out according 
to vendor protocols (61686, Active Motif). In brief, 50 ng of Spike-in 
chromatin (53083, Active Motif) was added to 25 pg of BMDM chromatin 
to incubate with 2 pg Spike-in antibody (61686, Active Motif) together 
with 4 pg of anti-H3K18la or anti-H3K18ac antibodies. After 4 h of incu- 
bation at 4 °C, Protein A Sepharose (17-5280-01, GE Healthcare Life Sci- 
ences) was added and incubated for another 2h, followed by sequential 
wash with buffer TSE I (0.1% SDS, 1% Triton X-100, 2 mM EDTA, 20 mM 
Tris-HCl pH 8.0, 150 mM NaCl), TSE II (0.1% SDS, 1% Triton X-100, 2 mM 
EDTA, 20 mM Tris-HCI pH 8.0, 500 mM NaCl), buffer III (0.25 M LiCl, 1% 
NP-40, 1% deoxycholate, 1 mM EDTA, 10 mM Tris-HCI pH 8.0), and TE 
buffer (1mM EDTA, 10 mM Tris-HCl pH 8.0). Chromatin DNA was finally 
eluted with buffer containing 1% SDS and 0.1 M NaHCO.,, The eluates 
were digested with RNase A (12091021, Thermo Fisher Scientific) and 
proteinase K (AM2546, Thermo Fisher Scientific). DNA was recovered 
using the QlAquick PCR purification kit (28106, Qiagen) according to 
the manufacturer’s instructions. 

ChIP-seq libraries were constructed with an Accel-NGS 2S Plus DNA 
Library Kit (Swift Biosciences) according to the manufacturer's protocol. 
The libraries were then amplified and assessed for fragment size using 
TapeStation (Agilent) and quantified using a Qubit dsDNA HS Assay 
Kit (Thermo Fisher Scientific). The indexed libraries were pooled and 
sequenced ona Hiseq4000 Sequencer (Illumina) using the 50-nucleotide 
single-read configuration. 

Bioinformatics analysis of ChIP-seq data: sequencing quality was 
evaluated by FastQC v.0.11.4. All reads were mapped to the reference 
genome of Illumina iGenomes UCSC mm10 using Bowtie v.2.2.6”°””, and 
only uniquely mapped reads were retained. Then SAMtools v.0.1.1978 was 
used to convert files to bam format, sort, and remove PCR duplicates. 
Peaks were called using MACS v.2.2.1” under g = 0.01. To quantify and 
directly compare H3K18la or H3K18ac in different samples (MO and 
M1 macrophages), the uniquely mapped H3K18la or H3K18ac reads 
in promoter regions (+ 2 kb around transcriptional start sites) of each 
gene were counted by featureCounts v.1.5.0-p1°, and then normalized 
by Spike-in ChIP read counts of the corresponding condition (MO or M1 
macrophages). The overlap genes in ChIP-seq and RNA-seq data were 
used for all subsequent analysis. Gene Ontology analysis (GOTERM_BP_ 
DIRECT) was carried out using DAVID Bioinformatics Resources 6.8”°”*. 

The following primers were used for qPCR analysis of gene pro- 
moter regions in human cells: FOXO3 (previously known as FOXO3A) 
promoter: CAGTGAGTGTGTGCAGCTTG, AAAGCCTCCTGTTTGTG 
CTT; FOXO3 downstream: TGCACACAGAAGCCAGAAG, GCTCCCCA 
CAGAGACGTAA; LDHA promoter: TAAGGGTGGGGGATACCTCT, 
CCCAAGAGAAAAATGCAAGC. The following primers were used for 
qPCR analysis of gene promoter regions in mouse cells: Arg1/Arg1- 
PTM: AAGCTGTGGCCTCAGAACAT, GGTAACCGCTGTGAAAGGAT; 
Arg1-HRE-1kb: CCCGAGTTTGACCCGAAGAA, CTTTACACAGGGACC 
GGACC; Arg1-HRE-2kb: TGTCTCTCCCAGTTTCCCCA, AGCAACTTGG 
CATCTGATGGA; Vegfa/Vegfa-PTM: CGAGCTAGCACTTCTCCCAG, 
AACTTCTGGGCTCTTCTCGC; Vegfa-HRE-1kb: GGCACCAAATTTGTGG 
CACT, CTGCCAGACTACACAGTGCA; Vegfa-HRE-2kb: ACCTGATCC 
TGATCCCTGCT, CAGCCTCTGTTATGCCACGA; Vegfa-HRE-3kb: 
GCAGAACCTAGGCTTCACGT, TTGAAAGGGCTGACATGGCT; Enol: 


AAGGTCATCAGCAAGGTCGT, CGTACTCCGAGTCTCACACG; Glut] (also 
known as Slc2a1): TAGATCCCCTCCCTCTTGCT, GAACACGTAGCCTGC 
TCACA; gene desert: CTGCCAGGGT TGTAGAGAGG, GCCAGATCATATT 
GGCTTGG. 


Statistical analysis 

Nostatistical methods were used to predetermine sample size. The sig- 
nificance of differences in the experimental data were determined using 
GraphPad Prism 7.0 software. All data involving statistics are presented 
as mean +s.e.m. For data presented without statistics, experiments were 
repeated at least three times to ensure reproducibility, unless otherwise 
stated. The experiments were not randomized, and investigators were 
not blinded to allocation during experiments and outcome assessment. 


Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The ChIP-seq and RNA-seq data have been made available at the Gene 
Expression Omnibus (GEO) repository under the accession number 
GSE115354. The mass spectrometry proteomics data have been depos- 
ited to the ProteomeXchange Consortium via the PRIDE” partner reposi- 
tory with the dataset identifier PXDO14870. All other data are available 
from the authors upon reasonable request. 


18. Kratz, M. et al. Metabolic dysfunction drives a mechanistically distinct proinflammatory 
phenotype in adipose tissue macrophages. Cell Metab. 20, 614-625 (2014). 

19. Shechter, D., Dormann, H. L., Allis, C. D. & Hake, S. B. Extraction, purification and analysis 
of histones. Nat. Protocols 2, 1445-1457 (2007). 

20. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized 
p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 
26, 1367-1372 (2008). 

21. Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory 
requirements. Nat. Methods 12, 357-360 (2015). 

22. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for 
differential expression analysis of digital gene expression data. Bioinformatics 26, 
139-140 (2010). 

23. Huang, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large 
gene lists using DAVID bioinformatics resources. Nat. Protocols 4, 44-57 (2009). 


24. Huang, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths 
toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 
1-13 (2009). 

25. Cuddapah, S. et al. Native chromatin preparation and Illumina/Solexa library 
construction. Cold Spring Harb. Protoc. 2009, pdb prot5237 (2009). 

26. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 

9, 357-359 (2012). 

27. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient 

alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 

(2009). 

28. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 

2078-2079 (2009). 

29. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 

(2008). 

30. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program 

‘or assigning sequence reads to genomic features. Bioinformatics 30, 923-930 

(2014). 

31. Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: 
improving support for quantification data. Nucleic Acids Res. 47 (D1), D442-D450 
(2019). 


Acknowledgements HEK293T p300 knockout cells were provided by X. Li. We thank S. 
Khochbin for brainstorming and critical reading of this manuscript. We thank K. Delaney and all 
other members of the Zhao and Becker laboratories for discussions and technical support. 
This work was supported by the University of Chicago, Nancy and Leonard Florsheim family 
fund (Y.Z.), NIH grants ROIGM115961, RO1DK118266 (Y.Z.), ROIDK102960, RO1HL137998 (L.B.), 
RO1CA129325, RO1DKO71900 (R.G.R.), and NSF1808087 (Y.G.Z.). 


Author contributions Y.Z. conceived the project and developed the general ideas and 
research strategy. D.Z., L.B. and Y.Z. designed the experimental approach and 
composed the manuscript. D.Z. performed most of the experiments. Z.T. and R.G.R. 
carried out in vitro chromatin-based transcription experiments. Y.W., H.H., W.L., J.D., 
L.D., S.K., S.L. and M.P.-N. contributed to mass spectrometry-related experiments and 
analysis; R.H., Z.Y. and B.R. performed the library construction and next-generation 
sequencing for ChIP-seq and RNA-seq; M.H. and Y.G.Z. synthesized L-lactyl-CoA. H.H. 
and D.Z. analysed ChIP-seq and RNA-seq data. G.Z. provided all primary BMDM cell 
cultures. D.C. and H.A.S. carried out the bacterial infection experiments; C.C. carried 
out TAM experiments. 


Competing interests Y.Z. is a co-founder, board member, and advisor to PTM Bio Inc. L.B. is a 
co-founder and CSO of rMark Bio Inc., and a founder and CEO of Onchilles Pharma Inc. 


Additional information 

Supplementary information is available for this paper at https://doi.org/10.1038/s41586-019- 
1678-1. 

Correspondence and requests for materials should be addressed to L.B. or Y.Z. 

Peer review information Nature thanks Luke O'Neill, Kathryn Wellen and the other, 
anonymous, reviewer(s) for their contribution to the peer review of this work. 

Reprints and permissions information is available at http://www.nature.com/reprints. 


Article 


ye ys ya ya y2 yt 
H4K8la dfgfudlefifei 


b ie 
H3K23la 374.24 be KK 
4628 @ 1003 ye [M+2H]++ ities 344.7021 
8 100 j & j 204.31 344.70 z=2 
e 9.72 2 3 bs ys 
z 1003 50 10.06 In vivo B soi bh |¥ ie 
2 400; 50 parent 2 115.05 147.11 317/22 Ha bs bs 974.36 ys 
° ‘ y! 8 \ 485.27 542.30 “ In vivo 
2 f 
g 2 Mixture id 0 
© 0 100 200 300 400 500 600 700 
0 2 4 6 8 10 12 14 16 18 3 1003 
8 374/24 344.7027 
Retention Time (min) Ss 4 204.14 a 
5 344.70 
c 2 59 1115.05 
< j 
Hakéla 8.52 r= 3 147.11 Synthetic 
100 8 485.27 542.30 631.38 
8 ; 
50 8.38 ot 
8 100 : ] In vivo g 100 344.70 3174.24 344.7025, 
3 ° 8.00 & q 204.14 Z=2 
2 50 : 3 
B 100 { Synthetic g j / 
3 soi ° ‘i =z 50 17) 574.36 
2 pure g 115.05 447.11 
ar ee ee er ae er ae ey ic q | \ | : L. 485.27 542.30 | 631.38 Mixture 
© : : ; & ot il | T T aa —+ ren T i T 
Retention Time (min) 


yioysysy7 yeys ys ys y2y1 


be H2BK5\a PIE UNeal Salar 
H2BK5la 13.43 227.10 ye 


oO 
Qa 


100) 
100 2 bz bs ba bs be b7 bs bo 
8 ae 2 a2 | 944.17 590.8322 
& 50 . 3 199.14 [M+2H]++ z=2 
3 100 13.18 Invi 5 4 bs ¥ 590.83 y ye ye 
= 0 5 In vivo 2 50] ys-H2O ye o 
2 100 = | { ; ° 412.26 770.44 94148 984.56 vi 
g ‘ 0 Synthetic 2 be b7-H20/ be b 1083.61 In vivo 
2 6 Mixture & Pt $95.37_ 75}-40 | 86546 947.50 oul 
[v4 T T T T T 
8 10 12 14 16 18 100 200 300 400 500 600 700 800 900 1000 1100 1200 
Retention Time (min 100) oped 
i ime (min) 8 | 244.17 590.8312 
8 129.10 z=2 
f A B C eg 4 
= 50) ae ea 590.83 954.56 
A B c¢ Kla (ing) K — Kbhb SF 129.40 he 779.44 841-48 : 
s q 315.21 483.29 611.34 1084.62 Synthetic 
1 Kla (4ng) Kac Khib « 
2 g 100] 297.10 590.8310 
Kla(16ng) Kpr Ker z 4 244.16 cae 
3 2 4 129.10 
~* Kla (64ng) Kbu Kma 2 ig 412.26 590.83 954.56 
4 & 4 49940 340.19 oe 841.48 
2 : zl 1083.62 i 
gS oq. 315.20 vn 29 : Mixture 
64ng each © | iL LS f 611.34 698. 27 | 866.46 _ | 937.50 | 
oF — — — 7 1, 
g Competition h L-lactate k b2 13C3-Sodium lactate derived '°C3-Kla yioysysy7 Ye Ys y4 y3 y2 y1 
227.10 
= 9,100 ya H2BKS: Plea staPlaleli 
c+ ow § 0 1 #5 25mm 8 
ef 2 = 5 eo “ ae 
rr Pan Kla = os Ss 
4ng 2 o H. 
y1-H2 
Pan Kac 240 BA Mitt yr pads yo 
16ng —_—=—_— — 5 20 592.34 b7 773.45 957.56 y10 
aw 
64ng @ Histone H3 <a <a em 100 "200 300~-400.~«500-«600, 700-800-900 1000. +1100 1200 
MCF-7 
i j 
25mM: NaCl NaLa NaAc 25mM: NaCl NaLa NaAc 
Pan Kia J ——= — Pan Kla = = 
TO ll 
Pan Kac —— a Pan Kac 
Histone H3—— ——— a HiStONC H3 


HeLa MDA-MB-231 


Extended Data Fig. 1| Validation of histone lysine lactylation. 

a,c, e, Extracted ion chromatograms from HPLC-MS/MS analysis of histone Kla 
peptides derived from cultured cells (in vivo), the synthetic counterparts, and 
their mixtures. b, d, MS/MS spectra of histone Kla peptides derived from in vivo, 
the synthetic counterparts, and their mixtures. f, g, Antibody specificity tests by 
dot blot and competition assay. f, Dot blot was carried out witha pan anti-Kla 
antibody and the following peptide libraries. Al, A2, A3 and A4: dots contain1, 4, 
16 and 64 ng, respectively, of a peptide library containing a lactylated lysine 


(Khib), crotonylated (Kcr) and malonylated (Kma) lysine residue, respectively. 
The libraries contained a mixture of CXXXKXXXxX peptides, in which C is 
cysteine, X isa mixture of all 19 amino acids except for cysteine, and K is lysine 
with or without the indicated modifications. g, Competition was carried out by 
incubating the pan anti-Kla antibody with a twofold or tenfold excess of the 
indicated peptide libraries before the dot blot assay. h-j, Exogenous lactate 
boosts histone Kla levels. Immunoblot analysis of histone Kla and Kac from 
human MCF-7 cells treated with indicated doses of L-lactate (h), and from human 


residue. B1, B2, B3 and B4: dots contain 64 ng of a peptide library containing an 
unmodified (K), acetylated (Kac), propionylated (Kpr) and butyrylated (Kbu) 
lysine residue, respectively. C1, C2,C3 and C4: dots contain 64 ng of a peptide 
library containing a B-hydroxybutyrylated (Kbhb), 2-hydroxyisobutyrylated 


HeLa (i) and MDA-MB-231 (j) cells treated with 25 mM sodium chloride, sodium 
lactate or sodium acetate. k, MS/MS spectra of an isotopically labelled histone 
Kla peptide identified from MCF-7 cells cultured with 10 mM isotopic (C;) 
sodium Llactate for 24h. Data ina-k represent three independent experiments. 


a Glucose b Glucose c Glucose 
o 41 5 25mM o 1 5  25mM o 1 5  25mM 
Pan Kla = = oe Pan Kla aed —4 Pan Kia p++ 
Pan Kac ee Pan Kac [ou an Pan Kac a 


Histone HS > eee cee eee 


Histone #3 OO ee 


Histone H2 seems 


A549 HeLa MEF 
d yo ye yr ye ys ys ys y2 yt 
100 a,-NH, bH,0 H3K27la: ef fof haf sfolef 
80 159.11 b.-H.O 
y, a, 2H, 310.18 a; bz b3 by bs bg 
O44 273.16 Ys 
60 ae 263.17 303.20 365.21 
y-H a, b, eee: 320 oy, 
QB 407 |430.40| [176-14 ie vy ve 462'27 /,, 63237 y, ij 
< 20 246.18) W, 461.27 706.42 y 
3 a el sasiaa | Sela tia 18C¢-Glucose-labeled 
= 0 ad m —— +t 
2 50 100 150 200 250 300 350 rs Ao 500 "550 600 650 700 750 800 850 
© 1005 a-NH 
SZ gol 156.107 a, 
wo 260.16 b, AH; {e) 
‘o 60 a, VE 62936 i, 
«40 wee 2 b 360 "22 700.40 787'43 
/ aed 18359'19 ; ; 
20 4627 
s02, 31 Unlabeled 
to} pi a + — — 
50 100 150 200 250 300 350 a 450 500 580 600 680 700 750 800 850 
m/z 
e Hi 7 - h : 
Control-Histone Kla Control-Histone Kac g DCA-Histone Kla DCA-Histone Kac 
_~ 100: -H2BK5 __ 80 ~~ H2BK5 -— 100 ~~ H2BK5 80 > H2BK5 
x x DS x 
= 80 -® H3K18 = —® H3K18 © 80 —® H3K18 = -® H3K18 
° ° 
s ~ H3k23 8 60 ~H3K23 «8 ~ H3K23-& 80 = H3K23 
= 60 Hake = ~H4Ke = £0 ~HaKe = 2 ~~ H4K8 
aD Da 
2 40 Hak =P” + H4K12. £ 4g ~H4Ki2,  £ ~~ HAK12 
3 3 3 g 
= 20 S 20 = 20 = 20 
° 
8 8 8 2 
0 “6 0 0 
0 20 40 60 (hrs) 0 20 40 60 (hrs) 0 20 40 60 (hrs) 0 20 40 60 (hrs) 
. 7 k | Rotenone DCA 
! Glucose: 25mM J Glucose: 25mM Glucose: 25mM 9 m isk 
7 = = 2.0: — 
so so > a Q 
ssSF ssKF Sess 2 e | S 
— == = & —- > ——_—_ @ 2 15 
Pan Kla -—— =——<€ -_onas § < ae 
5 8 | 
 —— —— —_ Sil 
2 ro we 
Histone H3 eee eee see eee eee eS es 7 Q 0.5: > 
9 = 
MCF-7 HepG2 MEF a D 46 
a Kac_ Kla 


Extended Data Fig. 2| Histone Klais modulated by the glycolysis pathway. 
a-c, A549 (a), HeLa (b) and mouse embryonic fibroblast (MEF) (c) cells were 
cultured with indicated doses of glucose for 24 h, without pyruvate. Histone Kla 
and Kac were analysed by immunoblots using indicated antibodies. d, MS/MS 
spectra of aU-°C,-glucose labelled histone Kla peptide and its unlabelled 
counterpart from MCF-7 cells. e-h, Quantitative proteomic analysis of histone 
extracts from MCF-7 cells cultured in the presence of U-C, glucose for 6h,12h, 
24 hand 48h, with or without 10 mM DCA. i-k, Histone Kla and Kac levels were 
analysed by immunoblots using whole-cell lysates from MCF-7, HepG2 and MEF 


cells exposed to 25 mM glucose for the indicated times. I, m, SILAC-MS/MS 
quantification of histone Kla and Kac marks from MCF-7 cells, comparing 
rotenone (10 nM, 24h) versus DMSO treatment (I), and DCA (10 mM, 24 h) versus 
PBS treatment (m). SILAC ratio was normalized to protein abundance. Each dot 
inthe scatter dot plot represents one identified peptide from core histone. Data 
are mean+s.e.m.I, Kac:1.121+ 0.05084, n=31; Kla:1.599 + 0.139, n=25. m, Kac: 
1.038 + 0.03813, n=49; Kla: 0.6627 + 0.06376, n= 24. Statistical significance was 
determined using two-tailed Welch’s t-test. Datain a-d, i-k represent three 
independent experiments. Data in e-h represent two independent experiments. 
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Extended Data Fig. 3 | Histone Klais induced by hypoxia. a—d, Antibody 
specificity was analysed by dot blot assay. ac, acetyl lysine; bhb, 
B-hydroxybutyryl lysine; bu, butyryl lysine; cr, crotonyl lysine; 

hib, 2-hydroxyisobutyryl lysine; la, lactyl lysine; pr, propionyl lysine; succ, 
succinyl lysine; un, unmodified lysine. Klalibrary contains a mixture of 
CXXXKlaXXXxX peptides, in which C is cysteine, X is a mixture of all19 amino 
acids except for cysteine, and Kla is lactyl lysine. e, f, SILAC-MS/MS 
quantification of histone Kla and Kac marks from MCF-7 cells, comparing 
hypoxic (1% oxygen for 24 h) and normoxic conditions. SILAC ratio was 


normalized to protein abundance. g, h, Immunoblots of histone Kla and Kac 
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from human HeLa and mouse RAW 264.7 cells in response to hypoxia (1% 
oxygen) at the indicated time. i,j, Intracellular lactate levels (i) and histone Kla 
levels (j) were measured in MCF-7 cells comparing normoxia, hypoxia (1% 
oxygen, 24h) and hypoxia in the presence of 10 mM oxamate or DCA. 

k, I, Intracellular lactate levels (k) and histone Kla levels (1) were comparedin 
LDHA’,LDHB’, LDHA’ LDHB~ or wild-type (WT) HepG2 cells. Data are mean 
ands.e.m. from three biological independent samples; statistical significance 
was determined using one-way ANOVA followed by Dunnett’s multiple 
comparisons test. Dataina-d, g,h,kandI represent three independent 
experiments. 
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Extended Data Fig. 4| See next page for caption. 
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Extended Data Fig. 4 | Histone Kla is induced during M1 macrophage 
polarization. a-f, Quantitative proteomic analysis of histone extracts from MO 
and M1 macrophages (BMDMs) cultured in the presence of U-¥C,-glucose for 3, 
6,12 and 24h, or with 10 pM GNE-140 (LDHA/B inhibitor) for 24h. g, Histone Kla 
and Kac levels were analysed by immunoblots 24 h after activation by LPS and 
IFNy, with or without replenishing fresh media (containing LPS and IFNy or not) 
every 4h.h, BMDM cells were stimulated with PBS (MO), LPS plus IFNy (M1), and 
IL-4 (M2) for 24 h. Intracellular lactate was measured using a lactate colorimetric 
kit. Dataare mean and s.e.m. from three biological independent samples; 
statistical significance was determined using one-way ANOVA followed by 
Dunnett’s multiple comparisons test. i,j, Antibody specificity was evaluated by 
ChIP-qPCR. Competition was carried out by pre-incubating the indicated 


antibodies witha tenfold excess of corresponding peptides. k, H3K18la antibody 
specificity was shown by full immunoblot using total lysate from MCF-7 cells 
with or without 10 mM sodium Llactate treatment for 24 h.1, H3K18la and 
H3K18ac are enriched in promoter regions. The promoter was defined as 
regions +2 kb around knowntranscription start sites. m,n, H3K18laand 
H3K18ac correlate with steady-state MRNA levels. The average ChIP signal 
intensity (read count per million mapped reads) for indicated antibodies is 
shown for genes with different expression levels (the top 25%, the second 25%, 
the third 25%, and the bottom 25% of RNA-seq counts). 0, p, IGV tracks for Arg 
and Crem from ChIP-seq analysis, representing data from single experiment. 
Data ina-f represent two independent experiments. Dataing, i-k represent 
three independent experiments. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Histone Kla-specific genes are associated with late 
activated M2-like gene expression. a, b, Heat maps showing expression 
kinetics of total genes (a) and H3K18la-specific genes (b) during M1 macrophage 
polarization. n=4 biological replicates. The colour key represents log,- 
transformed fold change relative to the mean of each row. Arrows next to the 
heatmaps refer to late activated genes (16-24 h) from H3K18la-specific or total 
genes used for contingency test. c, Contingency table analysis (Fisher’s exact 
tests) shows the relation between specific H3K18la enrichment (H3K18lalog,- 
transformed fold change =1and H3K18ac log-transformed fold change < 0.5) 
and late gene activation. d, Gene Ontology analysis (biological processes) of 
H3K18la-specific genes. Statistical significance was determined by modified 


Fisher’s exact test (EASE score) using DAVID bioinformatics resources 6.8; 
n=1,223 genes. e-j, BMDM cells were infected with indicated Gram-negative 
bacteria for 24 h. Intracellular lactate (e) and histone Kla levels (f) were measured 
24 hafter bacterial challenge. e, n=3 biological replicates; statistical 
significance was determined using one-way ANOVA followed by Dunnett’s 
multiple comparisons test. g-j, Gene expression was analysed by RT-qPCRat 
indicated time points after bacterial challenge. k, Activities of iNOS and ARG1 
were analysed by and commercialized kits from BMDMs activated by the 
indicated stimuli. Data are mean ands.e.m. from three biological replicates. 
Data infandk represent three independent experiments. 
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Extended Data Fig. 6 | Histone Kla levels are positively correlated with Arg] (d, e) from FACS-sorted peritoneal macrophages and TAMs within the tumour 
expression in TAMs. a, The purity of TAMs and peritoneal macrophages (PMs) from LLC and B16 tumours. Datain c—e are mean ands.e.m.n=5 biological 
was confirmed by flow cytometry using CD11b and F4/80 antibodies. b-e, Data independent samples; statistical significance was determined using one-way 
were quantified by FlowJo v.10.4.1. Histone Kla and Kac levels were analysed by ANOVA followed by Dunnett’s multiple comparisons test. Datainaandb 


immunoblots (b), intracellular lactate was measured using alactatecolorimetric represent five independent mice. 
assay kit (c), and gene expression of Arg] and Vegfa were analysed by RT-qPCR 
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Extended Data Fig. 7| Decreased lactate production lowered histone Kla 
levels and Arg expression during M1 polarization. a, b, Genotyping of 
Ldha"'x LysM-Cre* mice. c, Genotype validation by LDHA immunoblot 
analysis. d-g, Gene expression analysis of cytokines by RT-qPCRat indicated 
time points after M1 polarization. h—-m, Intracellular lactate levels (h) were 
analysed using a lactate colorimetric assay kit and global histone Kla levels (i) 
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0 
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were measured by immunoblots 24 h after M1 polarization. Inhibitors were 
treated 30 min after M1 polarization. Gene expression was analysed by RT-qPCR 
at indicated time points after M1 polarization (j-m). Dataare mean ands.e.m. 
from three biological replicates. Statistical significance was determined using 
one-way ANOVA followed by Dunnett’s multiple comparisons test. Datain a—c 
andirepresent three independent experiments. 
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Extended Data Fig. 8 | Exogenous lactate activates M2-like gene expression using multiple ¢-tests corrected using the Holm-Sidak method. h, H3K18la 
through histone Kla. a-d, Exogenous lactate (LA) does not interfere with gene occupancy at the Vegfa promoter was analysed by ChIP-qPCRat indicated time 
expression of inflammatory cytokines. Data are mean +s.e.m. from four and treatment; data represent three technical replicates from pooled samples. 
biological replicates. e, Number of lactate-activated H3K18la-specific genes at i-m, HIF1a is not required for histone Kla-mediated Arg/ induction during M1 
indicated times are shown ina Venn diagram. f, Gene Ontology analysis polarization. i, Immunoblot of HIFla at indicated time points after M1 
(biological processes) of lactate-induced H3K18la-specific genes at 16 and 24h polarization. j, Illustration of genomic loci targeted by Arg] and Vegfa ChIP- 
after M1 polarization. Statistical significance was determined by modified qPCR primers. HRE indicates regions containing the putative HIFla binding 
Fisher’s exact test (EASE score) using DAVID bioinformatics resources 6.8;n=112 —motif‘ACGTG’.k-m, ChIP-qPCR analysis of HIFla binding to indicated genomic 
genes. g, Vegfa was induced by exogenous lactate during M1 macrophage locations; data represent three technical replicates from pooled samples. Data 


polarization; n=4 biological replicates; statistical significance was determined are mean ands.e.m. Data inirepresent three independent experiments. 
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Extended Data Fig. 9 | Histone Kla directly stimulates gene transcription 
from recombinant chromatin in vitro. a, Protocol for assembly, modification 
and transcription of chromatin templates. b, P300 catalyses histone lactylation 
ina p53-dependent manner. c, Histone lactylation directly stimulates p53- 
dependent transcription from recombinant chromatin. d, H3 and H4 lysine-to- 
arginine (KR) mutations eliminate p300-dependent transcriptional activation 
by p53. Recombinant chromatin was assembled with wild-type or H3KR, H4KR, 
H2AKR or H2BKR mutant histones as indicated. e, HEK293T cells were 
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transfected with vector or Flag-tagged p300 plasmid. At 48 h after transfection, 
whole-cell lysates were prepared and immunoblotted with indicated antibodies. 
f,g, Immunoblots of histone Klaand Kac levels in HCT116 (f) and HEK293T cells (g) 
in which p300 was genetically deleted. h-k, Quality control of synthesized 
L-lactyl-CoA.h, Illustration of L-lactyl-CoA structure. i,j, HPLC analysis of the 
synthesized L-lactyl-CoA. The UV detection wavelength was fixed at 214 and 
254nm.k, MALDI-mass spectrometry analysis of L-lactyl-CoA. Data in b-g 
andi-k represent three independent experiments. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


x| The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


x A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 


x 
| Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


[x]|[__| A description of all covariates tested 


x A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 
x] A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 

a AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 
[x] For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 

a Give P values as exact values whenever suitable. 

x For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 

x For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 

x]I[_] 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Mass spectrometry: Thermo Xcalibur 3.0.63 
Flow cytometry: FACSDiva 10.5.3 


Data analysis GraphPad 7.0 was used to perform general statistical analyses. 
ChIP-seq and RNAseq: 
FastQC version 0.11.4, Bowtie version 2.2.6, SAMtools version 0.1.19, MACS version 2.2.1, featureCounts version 1.5.0-p1, DAVID 
Bioinformatics Resources 6.8, HISAT2 version 2.1.0, edgeR version 3.16.5, and Perseus version 1.6.1.1. 
Mass spectrometry: MaxQuant 1.3.0.5. 
Flow Cytometry: FlowJo v.10.4.1. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information 


Data 


Policy information about availability of data 


All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


The ChIP-seq and RNA-seq data have been made available at the Gene Expression Omnibus (GEO) repository under the accession number GSE115354. All other data 
are available from the authors upon reasonable request. 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


[x | Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Specific sample sizes are described in figures or figure legends for all experiments. The sample size used for animal experiment is based on 
previous experience from the Zhao and Becker labs. No statistical test was used to pre-determine sample size. 


Data exclusions | No samples or animals were excluded from the analyses. 
Replication The number of repeats for each experiments are described in corresponding figure legends. All repeats support the same conclusion. 
Randomization _ Cells or mice tissue were randomly assigned to groups (chemical compound/hypoxia/other treatments). 


Blinding For animal related experiments, the investigators were divided into two groups: one group is responsible for collecting samples and the other 
group is responsible for experiment and outcome assessment. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
[x] Antibodies x] ChIP-seq 
[x Eukaryotic cell lines x | Flow cytometry 
x Palaeontology x MRI-based neuroimaging 


[x] Animals and other organisms 


x Human research participants 


x]|[_] Clinical data 


Antibodies 


Antibodies used The following antibodies were generated by PTM Bio Inc (Chicago, IL): 
pan anti-Kac (PTM-101), 1:2000 (WB) 
pan anti-Kla (PTM-1401), 1:2000 (WB) 
anti-H3K18la (PTM-1406), 1:5000 (WB), 4ug per per ChIP 
anti-H4K8la (PTM-1405), 1:5000 (WB) 
anti-H4K5la (PTM-1407), 1:5000 (WB) 


The following antibodies were generated by Abcam (Cambridge, MA): 
anti-histone H3 (ab12079), 1:10000(WB) 

anti-H3K18ac (ab1191), 1:10000 (WB), 4ug per ChIP 

anti-H3K27ac (ab4729), 1:5000 (WB) 


The following antibodies were generated by Active Motif (Carlsbad, CA): 
anti-drosophila spike-in antibody (61686), 2ug per ChIP 


The following antibodies were generated by Cell Signaling Technology (Danvers, MA): 
anti-LDHA (2012S), 1:2000 (WB) 


The following antibodies were generated by Millipore Sigma (Burlington, MA): 
anti-a-Tubulin (05-829), 1:5000 (WB) 
anti-LDHB (ABC927), 1:2000 (WB) 
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Validation 


The following antibodies wer 


e generated by Novus Biologicals (Littleton, CO): 


anti-HIF-1a (NB100-105), 1:2000 (WB) 


The following antibodies wer 


e generated by GeneTex (Irvine, CA): 


anti-iNOS (GTX130246), 1:2000 (WB) 
anti-Arg1 (GTX109242), 1:2000 (WB) 


The following antibodies wer 


e generated by Santa Cruz Biotechnology, Inc (Dallas, TX): 


anti-p300 (sc-584), 1:2000 (WB) 


The following antibodies wer 


e generated by ThermoFisher Scientific (Waltham, MA): 


anti-CD11b Monoclonal Antibody (M1/70), PE-Cyanine7, eBioscience (25-0112-82), 0.125 ug/test (Flow) 


anti-F4/80 Monoclonal Antib 


The following antibodies wer 
Peroxidase AffiniPure Goat A 
Peroxidase AffiniPure Goat A 


Pan anti-Kac (PTM-101): 
Species: human, mouse; App 
www.ptmbiolabs.com/produ 
Pan anti-Kla (PTM-1401): 
Species: human, mouse; App 
Anti-H4K8la (PTM-1405): 
Species: human, mouse; App 
Anti-H3K18la (PTM-1406): 
Species: human, mouse; App 
Anti-H4K5la (PTM-1407): 
Species: human, mouse; App 


Anti-histone H3 (ab12079): 


Species: human, mouse; App 
chip-grade-ab12079.html 
Anti-H3K18ac (ab1191): 
Species: human, mouse; App 
k18-antibody-chip-grade-ab1 
Anti-H3K27ac (ab4729): 


Species: human, mouse; App 
antibody-chip-grade-ab4729. 


Anti-LDHA (2012S): 
Species: human, mouse; App 
antibodies/Idha-antibody/20 


Anti-a-Tubulin (05-829): 
Species: human, mouse; App 
Anti-Tubulin-Antibody-clone- 
Anti-LDHB (ABC927): 
Species: human, mouse; App 


ody (BM8), APC, eBioscience (17-4801-82), 2 ug/test (Flow) 
e generated by Jackson ImmunoResearch Laboratories (West Grove, PA): 


nti-Mouse IgG (H+L) (115-035-003), 1:10000 (WB) 
nti-Rabbit IgG (H+L) (111-035-003), 1:10000 (WB) 


ication: Western Blot, Immunoprecipitation; Manufacturer's web site: https:// 
ct/ptm-101/ 


ication: Dot Blot, Western Blot, Immunoprecipitation; Validated in this paper. 
ication: Western Blot; Validated in this paper. 
ication: Dot Blot, Western Blot, ChIP; Validated in this paper. 


ication: Dot Blot, Western Blot; Validated in this paper. 


ication: Western Blot; Manufacturer's web site: https://www.abcam.com/histone-h3-antibody- 


ication: Western Blot, ChIP; Manufacturer's web site: https://www.abcam.com/histone-h3-acetyl- 
191.html 


ication: Western Blot; Manufacturer's web site: https://www.abcam.com/histone-h3-acetyl-k27- 
html 


ication: Western Blot; Manufacturer's web site: https://www.cellsignal.com/products/primary- 
2 


ication: Western Blot; Manufacturer's web site: https://www.emdmillipore.com/US/en/product/ 
DM1A,MM_NF-05-829 


ication: Western Blot; Manufacturer's web site: https://www.emdmillipore.com/US/en/product/ 


Anti-LDHB-Antibody, MM_NF-ABC927 


Anti-drosophila spike-in antibody (61686): 


Species: drosophila; Applicati 


Anti-iNOS (GTX130246): 
Species: mouse; Application: 
GTX130246 
Anti-Arg1(GTX109242): 
Species: mouse; Application: 
antibody/GTX109242 


anti-p300 (N15) (sc-584): 
Species: human; Application: 


on: ChIP; Manufacturer's web site: https://www.activemotif.com/catalog/1091/chip-normalization 


Western Blot; Manufacturer's web site: https://www.genetex.com/Product/Detail/iNOS-antibody/ 


Western Blot; Manufacturer's web site: https://www.genetex.com/Product/Detail/Arginase-1- 


Western Blot; Manufacturer's web site: https://www.scbt.com/scbt/product/p300-antibody-n-15 


p300 (N-15) has been discontinued and replaced by p300 (F-4): sc-48343. 
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Anti-CD11b Monoclonal Antibody (M1/70), PE-Cyanine7, eBioscience (25-0112-82): 


Species: mouse; Application: Flow cytometry; Manufacturer's web site: https://www.thermofisher.com/antibody/product/ 
CD11b-Antibody-clone-M1-70-Monoclonal/25-0112-82 


Anti-F4/80 Monoclonal Antibody (BM8), APC, eBioscience (17-4801-82), 


Species: mouse; Application: Flow cytometry; Manufacturer's web site: https://www.thermofisher.com/antibody/product/F4-80- 
Antibody-clone-BM8-Monoclonal/17-4801-82 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) HeLa, MCF-7, A549, MDA-MB-231, HepG2, MEF, and RAW264.7 cells were obtained from the American Type Culture 
Collection. 
Authentication HeLa, MCF-7, A549, MDA-MB-231, HepG2, MEF, and RAW264.7 cells were authenticated based on our vast experience 


working with these cells lines (such as cell morphology, culture conditions, etc.). Furthermore, we believe that the 
modification we described in the paper is widely existed in various cell lines, not specific to certain cell types. 
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Mycoplasma contamination Cells were routinely tested for mycoplasma contamination, and only negative cells were used in experiments 


Commonly misidentified lines None of the cell lines used are listed in the database of commonly misidentified cell lines maintained by ICLAC. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Adult male mice (Mus musculus, C57BL/6, 7-10 weeks old) were purchased from The Jackson Laboratory, and were used to 
generate bone marrow derived macrophages (BMDMs). Ldhafl/fl mice (Jackson laboratory, 030112) and LysMcre mice (Jackson 
laboratory, 004781) were used to generate LysMcre+/- Ldhafl/fl and littermate control LysM-cre-/- Ldhafl/fl mice. 


Wild animals The study did not involve samples collected from wild animals. 
Field-collected samples The study did not involve samples collected from the field. 
Ethics oversight All animal protocols were approved by Institutional Animal Care and Use Committee (ACUP) at the University of Chicago. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


ChIP-seq 


Data deposition 


x | Confirm that both raw and final processed data have been deposited in a public database such as GEO. 


x | Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks. 


Data access links https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE115354 
May remain private before publication. 


Files in database submission GSM3176446_RH_148_peaks.broadPeak.gz 
GSM3176447_RH_229_peaks.broadPeak.gz 
GSM3176449_RH_151_peaks.broadPeak.gz 
GSM3176450_RH_231_peaks.broadPeak.gz 
GSE115354_ChiP-seq_normalization.txt.gz 


Genome browser session N/A 
(e.g. UCSC) 
Methodology 

Replicates ChIP-seq samples are prepared as one replicate, pooled from four mice 

Sequencing depth RH_148 (unique reads: 21765076; spike-in reads: 310066) 9 
RH_151 (unique reads: 21006731; spike-in reads: 141688) S 
RH_229 (unique reads: 36576826; spike-in reads: 52482) x 
RH_231 (unique reads: 32948615; spike-in reads: 50470) = 

Antibodies The anti-H3K18la antibody was generated by PTM biolabs. The process for generating antibodies were described similarly in 


Cell, 2011. 146: p. 1016-1028. Mol Cell, 2015. 58(2): p. 203-15. Nat Chem Biol, 2014. 10(5): p. 365-70. except for using 
different immunogens. 


The anti-H3K18ac antibody was purchased from Abcam(ab1191, lot GR 300534-1) 


The evaluation of the antibody for specificity and ChIP grade is provided in the manuscript. 
Spike-in information: spike-in chromatin (Active motif, Catalog No. 53083), spike-in antibody (Active motif, Catalog No. 


61686) 
Peak calling parameters Peaks were called using MACS version 2.2.1 under q value = 0.01. 
Data quality Sequencing quality was evaluated by FastQC version 0.11.4. All reads were mapped to reference genome of illumina 


iGenomes UCSC mm10 using Bowtie version 2.2.6, and only uniquely mapped reads were retained. SAMtools version 
0.1.1926 was used to convert files to bam format, sort, and remove PCR duplicates. Peaks were called using MACS version 
2.2.1 under q value = 0.01. 


The number of peaks at the cutoff threshold in each sample: 
21885 peaks in GSM3176446_RH_148_peaks.broadPeak.gz 
41493 peaks in GSM3176447_RH_229_peaks.broadPeak.gz 
16139 peaks in GSM3176449_RH_151_peaks.broadPeak.gz 
42237 peaks in GSM3176450_RH_231_peaks.broadPeak.gz 
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Software Base called by Real-Time Analysis (RTA) 
Reads were mapped to reference genome of illumina iGenomes UCSC mm10 using Bowtie version 2.2.6, and only uniquely 
mapped reads were retained. 
SAMtools version 0.1.19 was used to convert files to bam format, sort, and remove PCR duplicates. 
Peaks were called using MACS version 2.2.1 under q value = 0.01. 
Uniquely mapped reads of each gene were counted by featureCounts version 1.5.0-p1, and normalized by corresponding 
uniquely mapped spiked-in ChIP read counts. 


Flow Cytometry 


Plots 


Confirm that: 


x | The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


x | The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


[¥ | All plots are contour plots with outliers or pseudocolor plots. 


x | A numerical value for number of cells or percentage (with statistics) is provided. 


Methodology 
Sample preparation 0.2 million cells were labeling with different fluorophore conjugated antibody at room temperature for 15mins, followed by two 
washes. 
Instrument Samples were analyzed using a FACSCanto™ II flow cytometer. 
Software Data were quantified by FlowJo v.10.4.1. 


Cell population abundance Purity for both TAM and Pmac are above 95% based on F4/80 and CD11b cell surface marker. positive gating were determined 
by negative control. 


Gating strategy Cells were gated by FSC/SSC for total population --> SSC-A/SSC-H for single cells --> cblue labeling for live cells population --> 
F4/80 and CD11b double positive (compared to negative population) for purity check. 


x | Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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The tricarboxylic acid cycle intermediate succinate is involved in metabolic processes 
and plays a crucial role in the homeostasis of mitochondrial reactive oxygen species’. 
The receptor responsible for succinate signalling, SUCNRI (also known as GPR91), is a 
member of the G-protein-coupled-receptor family’ and links succinate signalling to 
renin-induced hypertension, retinal angiogenesis and inflammation* >. Because 
SUCNRI senses succinate as an immunological danger signal°—which has relevance for 
diseases including ulcerative colitis, liver fibrosis’, diabetes and rheumatoid 
arthritis?*—it is of interest as a therapeutic target. Here we report the high-resolution 
crystal structure of rat SUCNR1 in complex with an intracellular binding nanobody in 
the inactive conformation. Structure-based mutagenesis and radioligand-binding 
studies, in conjunction with molecular modelling, identified key residues for species- 
selective antagonist binding and enabled the determination of the high-resolution 
crystal structure of ahumanized rat SUCNR1 in complex with a high-affinity, human- 
selective antagonist denoted NF-56-EJ40. We anticipate that these structural insights 
into the architecture of the succinate receptor and its antagonist selectivity will enable 


structure-based drug discovery and will further help to elucidate the function of 
SUCNRLin vitro andin vivo. 


Under certain conditions—including ischaemia reperfusion, hypoxia and 
inclassically activated macrophages—there is an increase in intracellular 
succinate levels in the mitochondria and cytosol’. Ischaemia reperfusion 
induces succinate dehydrogenase to catalyse its reverse reaction; the 
subsequent increase in mitochondrial succinate then drives—through 
reversal of the electron-transport chain—the production of reactive 
oxygen species and, ultimately, tissue damage”. In classically activated 
glycolytic macrophages, increased levels of cytoplasmic succinate lead 
to stabilization of HIFla and enhanced production of pro-inflammatory 
factors through glutamine-dependent anaplerosis and the ‘GABA 
(y-aminobutyric acid) shunt’ pathway”. The role of SUCNR1 in these 
intracellular processes is unknown; however, hypoxia, necrosis and 
inflammation also result in the extracellular accumulation of succinate, 
where it triggers SUCNRI signalling’. The succinate-SUCNR1 axis drives 
arange of events in many tissues—including eye, kidney, liver and gut” 
(Extended Data Fig. 1)—and has been associated with metabolic indi- 
cations®”’. Conversely, however, SUCNRI signalling in myeloid cells is 
reported to resolve acute inflammation in obesity”. 

Small non-metabolite agonists of SUCNR1 have been identified using 
molecular modelling and docking based on structures of the P2Y1 recep- 
tor’. High-affinity and selective antagonists have also been reported’, 
but structural details of the interaction between antagonists and SUCNR1 
remain unknown. Despite extensive mutagenesis studies investigat- 
ing the species-selectivity of SUCNR1 agonists*, it is not clear how the 


differences in structure-activity relationships arise among the SUCNR1 
orthologues. 

To enable structure determination, we screened human and rat 
SUCNRI orthologues for optimal expression, purification and stabi- 
lization, as well as screening various fusion proteins and detergents 
typically used to stabilize G-protein-coupled receptors (GPCRs)"”. We 
selected the rat orthologue on the basis of its lack of any glycosylation 
site and its increased biochemical stability in comparison with human 
SUCNRI1. 

We generated nanobodies to stabilize and trap the receptor ina sin- 
gle conformation, according to established protocols’. Nanobody6 
increased the thermal stability of wild-type rat SUCNRI, formed a stable 
complex with the receptor and acted as a negative allosteric modulator 
ina[*S]GTPyS assay, and could thus potentially stabilize an inactive 
conformation of the receptor (Extended Data Fig. 2a-d). Nanobody6 
also formed a stable complex with human SUCNRI, and showed a 
similar—albeit much weaker—modulation of receptor activity (Extended 
Data Fig. 2e, f). The unusually high thermal stability of rat SUCNRI1, 
in conjunction with the conformational stability provided by Nano- 
body6, enabled us to crystallize the full-length wild-type rat SUCNR1- 
Nanobody6 complex. We obtained crystals in lipidic cubic phase. The 
crystals diffracted anisotropically toa resolution of 2.1A (Extended Data 
Fig. 3a, b, Extended Data Table 1), and the resulting electron density 
maps were of excellent quality (Extended Data Fig. 3c, d). 


‘Chemical Biology & Therapeutics, Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland. *Global Discovery Chemistry, Novartis Institutes for BioMedical 
Research, Novartis Pharma AG, Basel, Switzerland. “Autoimmunity, Transplantation and Inflammation, Novartis Institutes for BioMedical Research, Novartis Pharma AG, Basel, Switzerland. 
4Present address: Confo Therapeutics, Zwijnaarde, Belgium. *e-mail: matthias.haffke@novartis.com; klemens.kaupmann@novartis.com; veli-pekka.jaakola@confotherapeutics.com 
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Fig. 1| Structure of the apo rat SUCNR1-Nanobody6 complex. a, Side view of 
the rat SUCNR1-Nanobody6 complex. SUCNR1is coloured in blue to red from N- 
to C terminus. Glycerol and 2,5-hexanediol, identified in electron-density maps, 
are shownas sphere models. The potential orthosteric ligand-binding site is 
shownasa grey surface. b, Detailed view of the SUCNR1-Nanobody6 interface. 
The CDR3 of Nanobody6 is shown in red and side chains of important residues 
are shownas sticks. 


Our structure of the rat SUCNR1I-Nanobodyé6 complex involves the 
receptor inan inactive conformation, based on comparisons with struc- 
tures of the B2-adrenergic receptor (82-AR)”° and the P2Y1 receptor”! 
(Extended Data Fig. 4a, b). Nanobody6 binds to the intracellular side of 
the receptor (Fig. 1a), at an interface that is similar to but distinct from 
that of the Ga subunit of the G-protein trimer (Extended Data Fig. 4c, 
d), andthe interaction surface area is around 750 A2. In comparison with 
other GPCR-nanobody complexes” ”°, Nanobodyé6 has an unusually 
long complementarity-determining region 3 (CDR3) (Extended Data 
Fig. 4e, f). The interface between SUCNR1 and Nanobodyé6is stabilized 
by several hydrogen bonds and hydrophobic interactions, involving 
CDR3 of Nanobodyé6 and intracellular loop 1 (ICL1), ICL2, ICL4 and 
helix VIII in SUCNR1 (Fig. 1b). The binding site for Nanobodyé6 on the 
receptor is highly conserved between rat, human and mouse SUCNR1 
(Extended Data Fig. 5a, b). In view of its overlapping binding site with 
Ga, Nanobody6 is expected to affect both G,,- and G,-mediated SUCNR1 
signalling. 

We purified and crystallized the receptor-nanobody complex inthe 
presence of a previously reported antagonist (compound 5g)"*; however, 
we did not observe any electron density in our structure for this com- 
pound. Possible explanations for the missing ligand density could be the 
low solubility of compound 5g, potential competition for binding with 
Nanobody6, or absorption into the lipidic cubic phase. We modelled two 
positions with unexplained electron density as glycerol and 2,5-hexan- 
ediol, respectively (Extended Data Fig. 3e-j). Although both influence 
the stability of the receptor, we could not attribute any functional effects 
to these molecules, and so categorize them as crystallization artefacts 
(Extended Data Fig. 2g-k). The glycerol molecule is located at a positively 
charged entry site toa hydrophobic pocket, which is surrounded by helix 
I, helix II, extracellular loop 1 (ECL1), helix III, helix VI and helix VII, andis 
partially occluded from solvent by ECL2. The lower part of this pocket 
is highly conserved, and is faced by side chains that were previously 
reported to be important for receptor activation by succinate’ (R95*””, 
H99?3, R248°* and R276’””; superscripts denote Ballesteros-Weinstein 
numbering for GPCRs”; Extended Data Fig. 6a, b). H99**? and R248°* 
constitute an intricate interhelical hydrogen-bond network with Y103>” 
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and Y244°*": this network stabilizes the overall receptor structure and the 
lower region of the hydrophobic pocket. R95*”’ and S180°* are involved 
inthe positioning of ECL2 by binding to the backbone carbonyl of V169 
and the hydroxyl group of Y171, respectively. The highly conserved 
residue D170 further stabilizes ECL2 by forming a hydrogen bond with 
Y272’* (Extended Data Fig. 6c). These residues could be important for 
stabilizing the interhelical receptor architecture and for retaining ECL2 
close to the orthosteric ligand-binding site. The precise positioning of 
ECL2 might therefore bea requirement for the formation of a functional 
succinate-binding site. It has previously been suggested that ECL2 in 
SUCNRI might function as a tethered inverse agonist’ held in place by 
R95*”: however, this seems unlikely when considering the molecular 
architecture around ECL2. Our structure of the receptor in the inactive 
conformation does not enable us to draw conclusions as to how SUCNR1 
achieves selectivity for succinate over other very similar dicarboxylates. 

We performed a high-throughput screen to identify new SUCNR1 
antagonists. 3-(4’-chloro-[1,1’-biphenyl]-3-carboxamido)-3-(pyridin- 
3-yl)propanoic acid (JC-59-GF68) was identified as a hit, with moderate 
activity in a [°S]GTPyS assay with human SUCNRI. It was optimized via the 
intermediate 2-(2-(4’-chloro-[1,1’-bipheny!]-3-carboxamido)phenyl)ace- 
tic acid (PB-20-OV24) to the high potency antagonist 2-(2-(4’-((4-methyl- 
piperazin-1-yl)methyl)-[1,1’-biphenyl]-3-carboxamido)phenyl)acetic acid 
(NF-56-EJ40) (Fig. 2a—c, Supplementary Methods). NF-56-EJ40 was highly 
selective for human SUCNR1and showed almost no activity towards rat 
SUCNRI. We then established a radioligand-binding assay for human 
SUCNRI using [?H]-labelled NF-56-EJ4.0 (dissociation constant, Ky=33 
nM; Fig. 2d, e), andinvestigated the potential orthosteric ligand-binding 
site of human SUCNRI by mutagenesis. Residues for mutagenesis were 
selected on the basis of our structure of rat SUCNRI, differences in pri- 
mary sequence between rat and human SUCNRI orthologues (Extended 
Data Fig. 5) and molecular-docking analysis of NF-56-EJ40 binding to 
a human SUCNRI1 model (Extended Data Fig. 7a). Molecular-docking 
analysis suggested that NF-56-EJ40 is bound deep inside the hydrophobic 
pocket, with the acid group coordinated by the hydroxyl groups of the 
conserved residues Y83”* and Y30'” on one side, and R281” on the 
other side. The conserved E18'” was predicted to form an additional 
hydrogen bond tothe piperazine ring of NF-56-EJ40. E22!*! and N274’” in 
human SUCNR1are replaced by K18" and K269”*” in rat SUCNRI1. These 
two aminoacid exchanges could prevent the binding of NF-56-EJ40 to rat 
SUCNRI1 owing to steric hindrance, providing a possible rationale for the 
observed species selectivity. Radioligand-binding studies with human 
SUCNRI (Extended Data Table 2, Extended Data Fig. 7b) showed partial 
agreement with our homology model: the Y30"”°F mutant of human 
SUCNRI1 (in which the tyrosine at position 30" was mutated to phenyla- 
lanine), which was expected to disrupt coordination of the acid moiety, 
showed reduced binding of NF-56-EJ40. Similar effects were observed 
with the E18'”’K and E18!””R mutants, probably owing to steric clashes of 
the Lys and Arg residues with NF-56-EJ40 and the loss ofa hydrogen bond 
toits piperazine ring. We next introduced the corresponding rat SUCNR1 
residues into human SUCNRI by preparing the E22'K/N274”” K double 
mutant. In this case, we did not obtain a reliable inhibitory constant (K;) 
owing toalowextent of radioligand binding. However, the reverse case, 
in which human SUCNRI residues were introduced into rat SUCNR1 to 
form the double mutant K18'E/K269”"N (hereafter denoted human- 
ized rat SUCNRI1), bound NF-56-EJ40 with an affinity almost equal to that 
observed for human SUCNRI (K;,=17.4 + 2.5 nM and K,;=33.5+7.0 nM for 
human and humanized rat SUCNRI, respectively; Fig. 2f). To further 
confirm our observations, we purified humanized rat SUCNR1 and found 
that NF-56-EJ40 increased the thermal stability of both humanized rat 
SUCNR1and human SUCNR1, but not that of rat SUCNRI (Fig. 2g-i). Taken 
together, our studies identified E22'! and N274’” as key determinants 
for the species selectivity of NF-56-EJ40 binding. 

On the basis of these data, we purified, crystallized and determined the 
structure of the humanized rat SUCNR1-Nanobody6-NF-56-EJ40 com- 
plex (Extended Data Fig. 8a, b). The crystals diffracted anisotropically 
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Fig. 2|Development ofa human SUCNR1 antagonist and humanization of rat 
SUCNRI1.a, Chemical structures of SUCNR1antagonists. b, c, [°S]GTPyS assay 
with human SUCNRI1(b) and rat SUCNRI1 (c) inthe presence of 50 1M succinate. 
d, Total and non-specific binding of the radioligand PH]NF-56-EJ40 to human 
SUCNRI, determined in the presence of 10 mM succinate. e, Specific binding of 
PH]NF-56-EJ40 calculated from d (total binding - non-specific binding); 

K,=33 nM. f, Radioligand competition binding experiment with wild-type 


toaresolution of less than 2 A (Extended Data Table 1), and the resulting 
electron density enabled the unambiguous placement of NF-56-EJ40in 
the orthosteric ligand-binding site (Fig. 3a, b, Extended Data Fig. 8c-e). 
Notably, the binding mode of NF-56-EJ40 was considerably different 
from that suggested by our molecular-docking studies, and binding 
was accompanied by large structural rearrangements in helix IV, ECL1 


E1811 R2767.39 


vil 
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human and humanized rat SUCNRL. For b-f, curves were calculated fromn=3 
independent experiments (data are mean +s.d.; individual data points are 
shown). g-i, Thermal stability assays by nano-differential scanning fluorimetry. 
Bars represent average melting temperature (7,,), and the data points fromn=3 
technical replicates are shown as individual circles. A7,, values compared to the 
control are indicated. The experiment was repeated independently twice with 
similar results. 


and in particular ECL2 close to the ligand-binding pocket (Extended 
Data Fig. 8f, g, i-k). Although the acid moiety is in a similar position to 
that predicted, the overall binding location of NF-56-EJ40 is about 6 A 
deeper with respect to the biphenyl and piperazine groups. As sug- 
gested by mutagenesis data, the K18'“E mutation is critical for ligand 
binding because it restores a key receptor-ligand interaction (Extended 


Fig. 3 | Binding mode of the antagonist NF-56-EJ40 
to humanized rat SUCNRI1. a, b, Humanized rat 
SUCNRI (blue) bound to NF-56-EJ40 (green) in side 
view (a) and top view (b). For clarity, ECL2, helix IV 
and helix V are omitted in the side view. Only key 
residues within 4 A of NF-56-EJ40 are shown (as 
sticks); dashed black lines represent hydrogen 
bonds. Note the water-mediated hydrogen bond 
between H99?8 and the amide oxygen in NF-56-EJ40. 
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Fig. 4| Structural similarities between SUCNR1 and the P2Y1 receptor. a, Side 
view of apo rat SUCNR1and the P2Y1 receptor (PDBID: 4XNW). P2Y1, orange; 
SUCNRI, blue. b, Top view of aligned receptors. Respective ligands (MRS2500 
and BPTU for P2Y1; glycerol or NF-56-EJ4.0 for SUCNR1) are shown as green or 
yellow sticks, respectively, with the orthosteric ligand-binding pockets in 
surface representation. Dashed red circles denote large structural differences 
between apo rat SUCNR1 and P2Y1 or NF-56-EJ4-bound humanized rat SUCNR1 
and P2Y1.c, Structural differences in the orthosteric ligand-binding site 


Data Fig. 8g, j). By contrast, the steric hindrance that was predicted for 
K269’ was not observed. Binding of the acid moiety of NF-56-EJ40 is 
mediated via the hydroxyl groups of Y79?% on one side and by Y26"”” 
and R276”” on the other side—a different coordination environment 
than that predicted. Additional cation-1 interactions with R95*”’ and 
van der Waals interactions with L75*°°, W84 and L98*” complete the 
coordination of NF-56-EJ40, and provide a complex and multivalent 
ligand-binding mode (Fig. 3a, b, Extended Data Fig. 8h). 

SUCNR1 has a high structural similarity to the P2Y1 receptor (root 
mean square deviation of 1.4 A to P2Y1; Protein Data Bank (PDB) ID: 
4XNW), as has been suggested previously’ (Fig. 4a). Similar to P2Y1, 
SUCNRI contains two disulfide bridges, which position ECL2 close to the 
orthosteric ligand-binding site (C91°” to C168) and stabilize the struc- 
ture between the N terminus and TM7 of the receptor (C7 to C263””*). In 
P2Y1, binding of the antagonist MRS2500 involves the B-hairpin formed 
by ECL2, and thus results in a partially closed binding pocket”. Our 
SUCNRIstructures, by contrast, show a rather open extracellular side, in 
which ECL2 is partially unstructured in the ligand-free form and is notice- 
ably outward-shifted in the antagonist-bound form (Fig. 4a, b). Notably, 
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between NF-56-EJ40-bound humanized rat SUCNR1and P2Y1 bound to 
MRS2500 (PDBID: 4XNW). Red arrows indicate amino acid side chains in P2Y1 
that clash with NF-56-EJ4.0 (K46'”’, Y1107). d, e, Comparison of the BPTU- 
binding sites of P2Y1and apo rat SUCNR1 (d) and of P2Y1 and NF-56-EJ40-bound 
humanized rat SUCNRI1 (e). BPTU is shownas greensticks. Key residues within 

4 AofBPTUare shown for both receptors. Dashed lines show the conserved 
binding mode, via hydrogen bonds, to the carbonyl of L102? in P2Y1 and C70?" 
inrat SUCNRI1. 


although the orthosteric ligand-binding sites in both receptors differ 
substantially (Fig. 4c), we find that some of the key structural features 
that are important for binding of the allosteric antagonist and antiplate- 
let agent 1-[2-(2-tert-butylphenoxy) pyridin-3-yl]-3-[4-(trifluoromethoxy) 
phenyl]Jurea (BPTU) to P2Y1 are also present in SUCNRL. Similar to P2Y1, 
P7378 in SUCNRI precludes intrahelical hydrogen bonding, and the 
carbonyl of C70? is thus available for interaction with the nitrogen 
atoms of the urea group in BPTU. However, in contrast to the mainly 
hydrophobic environment in P2Y1 (formed by T1037°°, M123*™, L126?” 
and Q12738) that accommodates the aryl group of BPTU, the residues 
S93*” and N94*8 in SUCNR1 would be available for hydrogen bonds and 
thus would favour electrostatic interactions (Fig. 4d, e). These structural 
similarities between P2Y1and SUCNRI1 therefore highlight the potential 
to develop allosteric SUCNR1 antagonists based on a BPTU scaffold. 
These high-resolution crystal structures of SUCNR1 are, to our knowl- 
edge, the first structures ofan alicarboxylic acid metabolite receptor. We 
anticipate that these structures, in conjunction with a newly generated 
nanobody anda high-affinity antagonist radioligand, willadvance the char- 
acterization of SUCNR1in metabolic and immunological disease settings. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Nanobody6 

The sequence encoding Nanobody6 was cloned into a pIEx/Bac-3- 
derived plasmid with a haemagglutinin signal sequence followed by a 
Flag tag at the N terminus of the nanobody and a10xHis-tag preceded by 
a human rhinovirus (HRV) 3C protease cleavage site at the C terminus. 

High-titre recombinant baculovirus was generated with the FlashBac 
Gold system according to the manufacturer’s instructions and used to 
infect Spodoptera frugiperda (Sf9) insect cells at a density of 2.0 x 10° 
cells per ml and incubated for 5 days at 27 °C in shaker flasks. The super- 
natant was collected by centrifugation at 10,000g for 20 min at 4 °C, 
adjusted to 25 mM Tris/HCI pH 8.0, 300 mM NaCl, 5mM CaCl, and 1mM 
CoCl, and incubated at 4 °C for 30 min. The precipitant was removed 
by filtration with glass-fibre pre-filters on top of 0.45-t1m Stericup filter 
devices (Millipore) and the supernatant was subsequently concentrated 
8 times with a tangential flow filtration device (Millipore) with a10 kDa 
molecular weight cut-off (MWCO) membrane. 

Talon resin (Clontech), equilibrated in wash buffer (25 mM HEPES 
pH8.0, 300 mM NaCl, 10 mM imidazole) was added to the supernatant 
and incubated ona shaker at 4 °C for Lh. The resin was collected by 
centrifugation at 1,500g for 20 min, transferred toa gravity flowcolumn 
and washed with 30 column volumes of wash buffer before eluting the 
protein with 5 column volumes of elution buffer (25 mM HEPES pH 8.0, 
300 mM NaCl, 250 mM imidazole). His-tagged HRV 3C protease (pre- 
pared in-house) was added at a 1:50 (w/w) ratio to cleave the C-terminal 
10xHis-tag and the sample was dialysed against 20-times excess of dialy- 
sis buffer (25 mM HEPES pH 8.0, 300 mM NaCl) at 4 °C overnight. Cleaved 
Nanobodyé6 was passed through Ni-NTA resin in a gravity flow column 
(equilibrated in dialysis buffer) to remove His-tagged HRV 3C protease, 
uncleaved Nanobodyé6 and the free 10xHis-tag. The flow through was 
concentrated toa final volume of less than 4 ml and further purified by 
size-exclusion chromatography on a Superdex S75 16/60 column (GE 
Healthcare), equilibrated in dialysis buffer. Protein containing fractions 
were pooled, concentrated to around 7 mg mI‘ with a 10 kDa MWCO 
concentrator (Millipore), flash-frozen in small aliquots in liquid nitrogen 
and stored at -80 °C until use. Final yields were about 4 mg of protein per 
litre of expression culture. Mass spectrometry confirmed the presence 
of two disulfide bridges in Nanobody6. 


Rat SUCNR1 

Wild-type rat SUCNRI (residues 2 to 317) was cloned into a pIEx/Bac- 
3-derived plasmid with a haemagglutinin signal sequence followed by 
a Flag tag at the N terminus of the receptor and a 10xHis-tag preceded 
by a HRV 3C protease cleavage site at the C terminus. 

High-titre recombinant baculovirus was generated with the FlashBac 
Gold system according to the manufacturer’s instructions and used to 
infect Spodoptera frugiperda (Sf9) insect cells at a density of 2.0 x 10° 
cells per ml and incubated for 3 days at 27 °C in shaker flasks. The cells 
were collected by centrifugation at 10,000g for 20 min at 4 °C, frozen and 
stored at —20 °C until further use. Membranes were prepared by lysing 
cells in hypotonic buffer (10 mM HEPES, pH 7.5, 10 mM MgCl, 20 mM 
KCland EDTA-free complete protease inhibitor cocktail tablets (Roche)) 
followed by centrifugation at 45,000 r.p.m. for 45 minina Ti-45 rotor at 
4 °C. The low-salt wash was repeated twice, followed by three washes in 
high-salt buffer (10 mM HEPES pH 7.5, 10 mM MgCl, 20 mM KCI, 1.0 M 
NaCl). The membranes were resuspended in 25 mM HEPES pH 7.5, 10 mM 
MgCl,,20 mM KCI, 30% (w/v) glycerol, flash-frozen in liquid nitrogen and 
stored at -80 °C until purification. For purification, membranes were 
thawed onice, incubated with 10 mM sodium succinate for Lh at 4 °C and 


then solubilized in 50 mM HEPES pH 7.5, 800 mM NaCl, 5 mM sodium 
succinate, 10% (w/v) glycerol, 1% (w/v) lauryl maltose neopentyl glycol 
(LMNG) and 0.2% (w/v) cholesteryl hemisuccinate (CHS) at 4 °C for 4h. 
Insoluble material was removed by centrifugation at 45,000 r.p.m. 
ina Ti-45 rotor for 90 min, imidazole (pH 7.5) was added to a final con- 
centration of 20 mM and the solution was incubated with TALON resin 
(Clontech) equilibrated in 25 mM HEPES pH 7.5, 800 mM NaCl, 5 mM 
sodium succinate, 20 mM imidazole, 10% (w/v) glycerol at 4 °C overnight. 
The resin was washed with 10 column volumes of equilibration buffer 
(25 mM HEPES pH 7.5, 800 mM NaCl, 1 mM sodium succinate, 20 mM 
imidazole, 10% (w/v) glycerol, 0.005% (w/v) LMNG, 0.001% (w/v) CHS) 
followed by 10 column volumes of wash buffer (25 mM HEPES pH 7.5, 
800 mM NaCl, 1mM sodium succinate, 25 mM imidazole, 10% (w/v) glyc- 
erol, 0.005% (w/v) LMNG, 0.001% (w/v) CHS). The receptor was eluted 
with10 column volumes of elution buffer (25 mM HEPES pH 7.5, 800 mM 
NaCl, 1mM sodium succinate, 300 mM imidazole, 10% (w/v) glycerol, 
0.005% (w/v) LMNG, 0.001% (w/v) CHS). Protein-containing fractions 
were pooled and directly used for complex formation with Nanobody6. 
Receptor preparations for other assays were further purified by size- 
exclusion chromatography on a Superdex S200 Increase 10/300GL 
column (GE Healthcare) equilibrated in 25 mM HEPES pH 7.5, 8300 mM 
NaCl, 10% (w/v) glycerol, 0.005% (w/v) LMNG, 0.001% (w/v) CHS. 
Glycerol-free receptor was purified as described above, but glycerol was 
omitted fromthe buffer at all steps of membrane preparation and purification. 


K18'"E/K269”2N rat SUCNRI1 (humanized rat SUCNR1) 

Rat SUCNRI (residues 2-317) with the two point mutations K18!"E and 
K269’"N was generated by site-directed mutagenesis using wild-type 
rat SUCNR1as a template and cloned into a pIEx/Bac-3 derived plasmid 
with a haemagglutinin signal sequence followed by a Flag tag at the N 
terminus of the receptor and a10xHis-tag preceded by aHRV3C protease 
cleavage site at the C terminus. Baculovirus generation, expression and 
purification were performed as described for rat SUCNRI. 


Human BRIL-SUCNR1 

The sequence coding for wild-type human SUCNRI (residues 2-334) was 
cloned into a pIEx/Bac-3-derived plasmid with a haemagglutinin signal 
sequence followed by a Flag tag and a cytochrome-B562 (BRIL) at the N 
terminus of the receptor and a10xHis-tag preceded by a HRV 3C protease 
cleavage site at the C terminus. Baculovirus generation, expression and 
purification were performed as described for rat SUCNRI. 


Human SUCNRI1 

The sequence coding for wild-type human SUCNRI (residues 2-334) 
was cloned into a pIEx/Bac-3-derived plasmid with a haemagglutinin 
signal sequence followed by a Flag tag at the N terminus of the receptor 
and a 10xHis-tag preceded by a HRV 3C protease cleavage site at the C 
terminus. Baculovirus generation, expression and purification were 
performed as described for rat SUCNRI1. 


Large-scale complex formation of rat SUCNRI with Nanobody6 
for crystallization 

Compound 5¢ (ref. °) was added to purified rat SUCNRI to a final con- 
centration of 1mM from 100 mM stock solution in DMSO, giving a final 
DMSO concentration of 1% (v/v). The receptor was mixed with a 1.2 
molar excess of purified Nanobody6, incubated on ice for 30 min and 
concentrated using a 100 kDa molecular weight cut-off concentrator 
(Millipore) toa final volume of 500 pl. The complex was further purified 
by size-exclusion chromatography ona S200 Increase 10/300 GLcolumn 
(GE Healthcare) equilibrated in25 mM HEPES pH 7.5, 800 mM NaCl, 10% 
(w/v) glycerol, 0.002% (w/v) LMNG, 0.0004% (w/v) CHS and 25 pM of 
compound 5¢ (ref. '°). Peak fractions were pooled and concentrated 
using a100 kDa MWCO concentrator (Millipore) to a final concentra- 
tion of 40-50 mg mI. The complex was flash-frozen in liquid nitrogen 
in small aliquots and stored at —80 °C until crystallization. 


Large-scale complex formation of humanized rat SUCNRI1 with 
Nanobodyé6 and NF-56-EJ40 for crystallization 

NF-56-EJ40 was added to purified humanized rat SUCNRI1 to a final con- 
centration of 100 pM from 100 mM stock solution in DMSO, giving a 
final DMSO concentration of 0.1% (v/v). The receptor was mixed witha 
1.1molar excess of purified Nanobody6, incubated on ice for 30 min and 
concentrated using a100 kDa MWCO concentrator (Millipore) toa final 
volume of 250 pl. The complex was further purified by size-exclusion 
chromatography onaS200 Increase 10/300 GL column (GE Healthcare) 
equilibrated in 25 mM HEPES pH 7.5, 800 mM NaCl, 10% (w/v) glycerol, 
0.002% (w/v) LMNG, 0.0004% (w/v) CHS and 20 uM of NF-56-EJ40. Peak 
fractions were pooled and concentrated using a 100 kDa MWCO con- 
centrator (Millipore) to a final concentration of 37 mg mI“. The complex 
was flash-frozen in liquid nitrogen in small aliquots and stored at —-80 °C 
until crystallization. 


Lipidic cubic phase crystallization 

The rat SUCNR1-Nanbody6 complex was reconstituted in lipidic 
cubic phase (LCP) by mixing protein at 40-50 mg ml” with 
monoolein:cholesterol (9:1) (w:w) at a 2:3 ratio (v:w) in SO pl Hamilton 
syringes using the two-syringe method”®. Crystallization trials were 
performed using Laminex glass sandwich plates (Molecular Dimen- 
sions) with a200-um spacer and dispensed using a Mosquito LCP robot 
(Labtech TTP). Protein-laden LCP (50 nl) was covered with 800 nl of 
precipitant and incubated at 20 °C in an RI-1000 imager (Formulatrix). 
The first crystals appeared within 24 h and grew to a maximum size of 
60pm x10 pum x10 pm in2 weeks in100 mM sodium citrate pH 4.8-5.4, 
24-30% (w/v) PEG400, 50 mM NaSCN, 2.5% (v/v) 2,5-hexanediol, 1% (v/v) 
DMSO. Crystals were directly collected from the LCP bolus with MiTeGen 
micromount loops and flash-frozen in liquid nitrogen. 

The humanized rat SUCNR1-Nanbody6-NF-56-EJ40 complex 
was reconstituted in LCP by mixing protein at 37 mg ml7 with 
monoolein:cholesterol (9:1) (w:w) at a 2:3 ratio (v:w) in 50 pl Hamilton 
syringes using the two-syringe method”. Crystallization trials were 
performed using Swissci Xpol glass sandwich plates (Swissci) with a 
200-um spacer and dispensed using a Mosquito LCP robot (Labtech TTP) 
with active humidification of the pipetting chamber. Protein-laden LCP 
(50 nl) was covered with 800 nl of precipitant and incubated at 20 °C 
in an RI-1000 imager (Formulatrix). The first crystals appeared within 
12 hand grew to a maximum size of 60 um x 60 um x 60 pmin5 days 
in 50 mM 2-[(2-amino-2-oxoethyl)-(carboxymethyl)amino]Jacetic acid 
(ADA) pH 7.0, 28% (w/v) poly(ethylene glycol) monomethyl ether (PEG 
MME) 550, 0.55 M (NH,),SO,, 100-400 EM NF-56-EJ40 and 1-4% (v/v) 
DMSO. Crystals were directly collected from the LCP bolus with MiTeGen 
micromount loops and flash-frozen in liquid nitrogen. 


Data collection, structure solution and refinement 

Data for the rat SUCNRI-Nanobody6 complex were collected at PXI 
at the Swiss Light Source, Villigen, Switzerland using a10-m anda 
20-m diameter beam ata wavelength of 0.999 AonaDectris Eiger-16M 
detector. Crystals were exposed for 0.1s per 0.1° oscillation per frame 
using an attenuated beam to reduce radiation damage. Datasets were 
integrated, scaled and merged using XDS and XSCALE in autoPROC 
and aP_Scale (Global Phasing). The final dataset was merged from 
18 crystals and anisotropic-scaled with diffraction limits of 2.959 A, 
2.088 A and 2.345 A. The structure was solved by molecular replace- 
ment in Phaser” using the structure of the P2Y1 receptor without 
the rubredoxin fusion as a search model (PDB ID: 4XNV). An initial 
structural model was built using Autobuild/Resolve in Phenix” and 
further adjusted by repetitive rounds of manual model building in 
Coot” and refinement against the anisotropic-scaled data in Buster 
(Global Phasing). 99.76% of residues are within the allowed regions of 
the Ramachandran plot, with 0.24% being outliers and a clash-score 
of 5. The final model lacks residues 1-5 at the N terminus, residues 


160-167 in ECL2 and residues 214-223 in ICL3 of rat SUCNR1 and the 
first 9 residues at the N terminus of Nanbody6. 

Data for the humanized rat SUCNRI-Nanobody6-NF-56-EJ40 complex 
were collected at PXI at the Swiss Light Source using a 20-ym diam- 
eter beam at a wavelength of 1.0002 A ona Dectris Eiger-16M detector. 
Crystals were exposed for 0.15 per 0.2° oscillation per frame using an 
attenuated beam to reduce radiation damage. Datasets were integrated, 
scaled and merged using XDS and Aimless in autoPROC and aP_Scale 
(Global Phasing). The final dataset was obtained from a single crystal 
and anisotropic-scaled with diffraction limits of 2.327 A, 1.940 A and 
1.959 A. The structure was solved by molecular replacement in Phaser?” 
using the structure of the wild-type rat SUCNR1-Nanobody6 complex as 
asearch model. The structural model was further adjusted by repetitive 
rounds of manual model building in Coot® and refinement against the 
anisotropic-scaled data in Buster (Global Phasing). 100% of residues are 
within the allowed regions of the Ramachandran plot, and the model has 
aclash-score of 4. The final model lacks residues 1-6 at the N terminus, 
residues 215-223 in ICL3 and residues 257-261 in ECL3 of humanized 
rat SUCNR1 and the first 8 residues at the N terminus of Nanobody6. 


Molecular docking 

Homology modelling of human SUCNRI was performed with Prime 
(Schrédinger, v.2018-3). The coordinates of the high-resolution struc- 
ture of the full-length wild-type rat isoform of SUCNR1 were used as a 
template. Missing residues 160-167 in ECL2in the crystal structure were 
reconstructed by Prime. Default parameters were used. Compounds 
were prepared for docking using Corina (MN-AM, v.4.2.0) for initial 
conformation generation and protonated at pH 7.4 with blabber_sd util- 
ity from the MoKa package”® (Molecular Discovery, v.2.6.6). Dockings 
were performed with Glide*** (Schrodinger, v.2018-3) with the SP mode 
and default parameters. Poses were visually inspected for selection. 
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Nano-differential scanning fluorimetry thermostability assay 
Protein samples at 0.4 mg ml were prepared from 4.0 mg mI stock 
by adding assay buffer with or without glycerol (25 mM HEPES pH 7.5, 
800 mM NaCl, 0-10% (w/v) glycerol, 0.005% (w/v) LMNG, 0.001% (w/v) 
CHS) and ligand as required. The ligand added corresponded to no more 
than 5% of the final assay. Samples were incubated on ice and manually 
loaded into standard nano-differential scanning fluorimetry (nano- 
DSF) grade capillaries (NanoTemper Technologies). Experiments were 
performed with a Prometheus nano-DSF instrument (NanoTemper 
Technologies) with a temperature gradient from 20 °C to 95 °C anda 
temperature slope of 2.0 °C or 2.5 °C min”. Data were processed and ana- 
lysed using the PR.ThermControl Software (NanoTemper Technologies). 

For the assessment of compound stability, nano-DSF was performed 
at a protein concentration of 0.2 mg mlin 25 mM HEPES pH 7.5, 800 mM 
NaCl, 10% (w/v) glycerol, 0.01% (w/v) LMNG, 0.002% (w/v) CHS in 
the presence of 50 uM compound and 0.5% (v/v) DMSO. 


Analytical size-exclusion chromatography 

Samples were analysed on an HPLC 1100 instrument (Agilent) equipped 
witha Zenix-C SEC-300 column (Sepax Technologies) and a photodiode 
detector (Wyatt) in50 mM MES pH 6.0, 500 mM NaCl, 0.01% (w/v) LMNG 
with a flow rate of 0.3 ml min” or onan AKTA Micro equipped witha 
$200 Increase 3.2/300 column (GE Healthcare) in25 mM HEPES pH7.5, 
800 mM NaCl, 10% (w/v) glycerol, 0.01% (w/v) LMNG and 0.002% (w/v) 
CHS at a flow rate of 50 pl min? 


[>S]GTPyS assay 

Membranes were prepared from stably transfected rat SUCNR1 CHO- 
K1 and human S1P1 CHO-K1 cell lines (Novartis) and from Chem1 cells 
expressing human SUCNRI (Millipore). Cell cultures were rinsed with PBS 
and the cells were scraped off the flasks in ice-cold 20 mM HEPES buffer 
pH 7.4, 10 mM EDTA containing protease inhibitors (Roche). The pellet 
obtained after centrifugation for 30 min at 17,500g was resuspended in 
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ice-cold buffer as above supplemented with 100 mM NaCl and homog- 
enized using a PT 1300 D homogenizer (Polytron) at 25,000 r.p.m. for 
three intervals of 20s each. The homogenate was centrifuged at 39,000g 
for 40 min at 4 °C, the pellet resuspended and homogenized again at 
25,000 r.p.m. for 20 s (PT 1300 D homogenizer). Aliquots were stored 
at -80 °C. 

For [*°S]GTPyS assays, membranes were resuspended in assay buffer 
(20 mM HEPES pH 7.4, 100 mM NaCl, 10 mM MgCl, 25 pg mI saponin, 
100 pM GDP, 0.1% fat-free bovine serum albumin (BSA)) containing 
wheat germ agglutinin (WGA)-coated scintillation proximity assay (SPA) 
beads (Perkin Elmer). The final assay mixture in 96-well Optiplates (215 pl 
final volume; Perkin Elmer) contained 15 pg rat or human SUCNRI or 
3 wg human S1P1R membranes, 0.2 mg (rat or human SUCNRI1) or 1mg 
(human S1P1R) WGA SPA beads, 200 pM [*°S]GTPyS (Perkin Elmer), 
20 mM HEPES pH 7.4, 100 mM NaCl, 10 mM MgCl, 25 pg mI saponin, 
10 pM GDP, 0.1% fat-free BSA, receptor agonist (succinate or SIP) and 
test agents as indicated (Nanobody6 or glycerol). Assay plates were 
sealed and incubated with continuous shaking for 60 min at room tem- 
perature. Afterwards the plate was centrifuged for 10 min at 1,200 r.p.m. 
and the radioactivity counted using a TopCount NXT instrument (Perkin 
Elmer). 


[H] Radioligand-binding assay 

Cytomegalovirus promoter-based SUCNRI expression plasmids con- 
taining a haemagglutinin signal sequence followed by a Flag tag at 
the Nterminus of the receptor and a 10xHis-tag preceded by a HRV 3C 
protease cleavage site at the C terminus were transiently transfected 
(Lipofectamine 2000; Life Technologies) into HEK293FT cells (Life 
Technologies). Membranes were prepared two days after transfec- 
tion as described above and homogenized in Krebs-Tris buffer (20 mM 
Tris-HCI pH 7.4, 118 mM NaCl, 5.6 mM glucose, 1.2 mM KH;PO,, 4.7 mM 
KCI, 1.2 mM MgSO,, 1.8 mM CaCl,). Assay mixtures (200 pl) in 96-well 
multiscreen filter plates MSFBN6B10 (Millipore) contained 20 pg cell 
membranes, 50 nM [?H]NF-56-EJ40 (963 GBq mmol"; synthesized at 
Novartis) and test compounds and were incubated for 1h at room 
temperature. After two washes with 200 ul ice-cold Krebs-Tris buffer 
the plates were air-dried, the bottom of the plates sealed and 40 ul of 
scintillation fluid was added (MicroScint PS, Perkin Elmer). Radioactiv- 
ity was counted using a Topcount NXT (Perkin Elmer). Half-maximum 
inhibitory concentration (IC;,) values were determined from eight- 
point concentration-response curves and K;, values were calculated 
using the Cheng-Prusoff equation. 


Figure preparation 
Protein structure figures were prepared using PyMOL incentive 2.0.7. 
(Schrodinger). 

Sequences were aligned with ClustalOmega” and figures were pre- 
pared using ESPript3.0°”. ConSurf® was used to visualize sequence 
conservation on structures. [*°S]GTPyS assay data were analysed and 
figures prepared using GraphPad Prism. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this article. 


Data availability 


Structure factors and coordinates of the rat SUCNR1-Nanobody6 and 
the SUCNR1(K18!"E/K269”2N)-Nanobody6-NF-56-EJ40 complex 
structures have been deposited in the Protein Data Bank (PDB) under 
accession codes 6IBB and 6RNK, respectively. All source data associated 
with the paper (in addition to those deposited) are provided as Sup- 
plementary Information. 
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Extended Data Fig. 1| The succinate-SUCNRI signalling axis. Levels of the 
Krebs cycle intermediate succinate are increased under certain conditions such 
as hypoxia, necrosis, ischaemia reperfusion and inflammation. The ways in 
which succinate concentrations increase in the mitochondrion are shownin 
green. Mitochondrial reactive oxygen species result from the reversed electron 
transport (RET) chain driven by an increase in succinate. Succinate is 
transported into the cytoplasm, where it can stabilize HIFla and increase the 
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expression of genes that have HIF-responsive elements, suchas /L1B. Further 
succinate is exported into the local extracellular environment, where it 
accumulates and binds and activates SUCNRL. Several of the consequences of 
this are shownin red. GABA, y-aminobutyric acid; HIF, hypoxic inducible factor; 
IL-1, interleukin-1; mMROS, mitochondrial reactive oxygen species; SDH, succinate 
dehydrogenase. 
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Extended Data Fig. 2| Interaction of Nanobody6 with rat and human SUCNR1 
and characterization of the effects of glycerol and 2,5-hexanediol on rat 
SUCNR1.a, b, Nanobody6 increases the thermal stability of rat SUCNR1in nano- 
DSF thermal shift assays. The average melting curves in the absence (a) and 
presence (b) of Nanobodyé6 are shown (n=3; technical replicates). The 
experiment was repeated independently 3 times with similar results. 

c, Analytical size-exclusion chromatography shows a clear shift in the peak of the 
rat SUCNR1-Nanobody6 complex compared to the peak of the receptor alone. 
Thecomplex samples contain a 1.2 molar excess of Nanobody6 over receptor. 
One of n=2 independent experiments is shown. d, [*°S]GTPYS assay on wild-type 
rat SUCNR1in the absence or the presence of increasing concentrations of 
Nanobodyé6. The average curves of n=3 independent experiments are shown; 
data are mean +s.d. Average half-maximum effective concentration (EC;,) values 
fromn=3 independent experiments are listed; data are mean+s.d.e, Analytical 
size-exclusion chromatography shows a clear shift in the peak of the N-terminal 
BRIL-fused human SUCNR1-Nanobodyé6 complex compared to the peak of 

the receptor alone. One of n=2 independent experiments is shown. f, [°S]GTPyS 
assay on wild-type human SUCNR1 in the absence or the presence of increasing 
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concentrations of Nanobodyé6. The average curves of n=3 independent 
experiments are shown and average EC, values from n=3 independent 
experiments are listed; dataare mean+s.d.g, Analytical size-exclusion 
chromatography of rat SUCNRI purified in the absence or the presence of 10% 
glycerol. One of n=2 independent experiments is shown. h, Glycerol increases 
the thermal stability of rat SUCNR1as evidenced from nano-DSF assays (n=3 
technical replicates; bars represent mean values; individual data points are 
indicated by circles). The control sample was purified in the presence of 10% 
glycerol. All other samples contain rat SUCNR1 purified without glycerol, to 
which the respective final glycerol concentration was added. The experiment 
was repeated independently twice with similar results. i, 2,5-Hexanediol 
decreases the thermal stability of rat SUCNR1in nano-DSF assays (n= 3 technical 
replicates; bars represent mean values; individual data points are indicated by 
circles). The experiment was repeated independently twice with similar results. 
j,k, °SIGTPyS assay of rat SUCNR1 (j) and human S1P1R (k) inthe absence or the 
presence of 1% (109 mM) glycerol. The average curve of n=3 independent 
experiments and the individual data points of each experiment are shown. Data 
aremean+s.d. 
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Extended Data Fig. 3| Purification, crystallization and electron-density 
map quality of the rat SUCNR1-Nanobodyé6 complex, with detailed binding 
modes of glycerol and 2,5-hexanediol. a, Analytical size-exclusion 
chromatography and SDS-PAGE analysis of crystallization samples of the rat 
SUCNRI1-Nanobody6 complex. Shown is a representative experiment ofn=5 
independent experiments with similar results. For gel source data, see 
Supplementary Fig. 1a. b, Initial crystallization hits for the rat SUCNR1- 
Nanobody6 complex (top) and optimized crystals used for data collection 
(bottom). Shown are representative experiments of n=20 independent 
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experiments. c, The 2F, -F, electron density map contoured at 1.5ofor a part of 
Nanobodyé. d, The 2F, —-F, electron-density map contoured at 1.50 for helix VIlin 
rat SUCNR1.e, f, F, - F, composite omit map for glycerol (e) and corresponding 
2F,—-F,electron-density map after refinement (f). Both maps are contoured at 
1.50. g,h, F, - F, composite omit map for 2,5-hexanediol (g) and corresponding 
2F,—F,electron-density map after refinement (h). Both maps are contoured at 
1.50.i,j, Detailed views of the side-chain environment around 2,5-hexanediol (i) 
and glycerol (j). Hydrogen bonds are indicated by dashed lines. For clarity, only 
residues within a distance of 4 Aare shown. 
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Extended Data Fig. 4 | Rat SUCNR1 adopts an inactive conformationin 
complex with Nanobody6, which binds to the intracellular side viaan 
extended CDR3 loop. a, Structural alignment of helix VI in rat SUCNR1 and the 
active and inactive states of B2-AR, with the positions of key residues as 
hallmarks for inactive and active receptor states. SUCNR1is shown in blue, 
inactive B2-AR in red and active B2-AR in pink. Alignment of helix VI (left) and 
side-chain positions of key residues (R**° and Y”**) (right) indicate an inactive 
state for rat SUCNR1.b, Structural alignment of helix VI and key residues (R*°° 
and Y”*>) of rat SUCNRI (blue) and the P2Y1 receptor (orange) in the inactive 
conformation. c, Superposition of the rat SUCNR1-Nanobody6 complex with 


SUCNR1 
P2Y1 (inactive) 


R350, 


Nanobody6 


PDB 4MQS 
PDB 4XT1 
PDB 5JQH 
PDB 6B73 


the Ga, subunit from the B2-AR G,-protein trimer structure (PDB ID: 3SN6). Rat 
SUCNR1is shown in blue, Nanobody6 in orange and the Ga, subunit in red. The 
G-protein and the Nanobody6-binding site partially overlap. d, Magnified view 
of the overlap between the G-protein and the Nanobodyé6-binding site of rat 
SUCNR1.e, Structural alignment of nanobodies used to crystallize GPCRs. The 
extended CDR3 of Nanobodyé6 forms a helical secondary structure. f, Sequence 
alignment of the GPCR-stabilizing nanobodies shown ine. The PDB IDs of the 
respective GPCR-nanobody complex structures are listed and the CDR3 region 
is highlighted by a black bar. 
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Extended Data Fig. 5 | Sequence alignment ofrat, humanandmouseSUCNRI activation. Green dots indicate residues that are involved in NF-56-EJ40 binding 


and ConSurf analysis. a, The helical structure elements as observed inthe inthe humanized rat SUCNR1 structure. b, The sequence alignment of SUCNR1 
crystal structure of apo rat SUCNR1 are indicated. Sequences corresponding to from various species is colour-coded from turquoise (variable) to dark pink 
ECL1and ECL2 are boxed in blue and the non-conserved residues K18! and (conserved), similar to the colours used in Extended Data Fig. 6a, b, onthe basis 
K2697" are marked by blue arrows. Yellow arrowheads indicate residues that of analysis with ConSurf. Residues involved in the binding of Nanobodyé6 are 


were previously reported to be involved in succinate-induced receptor indicated by blue triangles. 
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Extended Data Fig. 6 | Potential ligand-binding site in apo SUCNR1is 
partially occluded by ECL2. a, Side view (top) and top view (bottom) of the 
hydrophobic pocket located below the glycerol molecule. The sequence 
conservation within the SUCNR1 receptor family is indicated by colour, ranging 
from turquoise for highly variable residues to dark pink for highly conserved 
residues. The hydrophobic pocket is shown as a surface, colour-coded by 
charge. The glycerol molecule is shown as yellow sticks. b, The same orientations 
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as inaare shown. Residues forming the deep hydrophobic pocket are shown as 
sticks, colour-coded by sequence conservation as ina. Residues that were 
previously reported to be involved in succinate-induced receptor activation are 
coloured yellow. c, Residues in the environment of ECL2 are shownas sticks and 
hydrogen bonds are shownas dashed black lines. Residues that have previously 
been reported to have an effect on succinate binding by the receptor are shown 
in yellow?; R251°°°, which was identified in a second study’, is shown in green. 
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Extended Data Fig. 7 | Identification of critical residues that impart species residues K18'*! and K269’, which are shown in blue. For clarity, ECL2, helix 1V 
selectivity for antagonist binding. a, Side view (left) and top view (right) ofthe | andhelix Vare omitted inthe side view. b, Radioligand competition binding 
potential binding mode of the antagonist NF-56-EJ40 (shown in pink), obtained experiments with unlabelled NF-56-EJ40 on human SUCNRI1 mutant proteins. 
by molecular modelling based on the crystal structure of apo rat SUCNRI1. Curves were calculated from n=3 independent experiments; data are 
Residues within 4 A of NF-56-EJ40 are shownas sticks. Red arrows point towards mean +s.d. Individual data points are shown. 

two sites in which potential steric clashes may occur with rat-SUCNR1-specific 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Purification, crystallization and electron-density 
map quality of the rat SUCNR1I-Nanobody6-NF-56-EJ40 complex and 
structural changes in SUCNR1 after antagonist binding. a, Analytical size- 
exclusion chromatography and SDS-PAGE analysis of crystallization samples of 
the humanized rat SUCNR1-Nanobody6-NF-56-EJ40 complex. Shownisa 
typical result from n=2 independent experiments. Although partial complex 
aggregation was observed, this did not interfere with crystallization. For gel 
source data, see Supplementary Fig. 1b. b, Top, initial crystallization hits for the 
humanized rat SUCNR1-Nanobody6-NF-56-EJ4.0 complex, shown in normal 
(left) and cross-polarization (right) imaging modes. Bottom, optimized crystals 
used for data collection shown in normal (left) or cross-polarization (right) 
imaging modes. A typical result from n=3 independent experiments is shown. 
c, The 2F, —F. electron density map contoured at 1.50 for a part of Nanobodyé6. 
d, The 2F, -F, electron density map contoured at 1.50 for helix VIlin humanized 
rat SUCNR1.e, F, — F, composite omit map (top) for NF-56-EJ40 and glycerol 
contoured at 1.50 is shown in orange. The 2F, — F, map (bottom) for NF-56-EJ40 
and glycerol after refinement contoured at 1.5ais shown in blue. f, Top view of 
apo rat SUCNRI1 (shown in orange) and humanized rat SUCNR1 (shown in blue) in 
complex with NF-56-EJ40 (shownas greensticks). Large structural 
rearrangements are indicated by red arrows. Note that ECL2 is completely 
structured in the humanized rat SUCNR1 structure. g, Top view of the NF-56- 
EJ40-binding site in humanized rat SUCNR1 overlaid with apo rat SUCNR1. 
Important side chains around NF-56-EJ40 (shown in green) are shownas sticks 
and are coloured blue for humanized rat SUCNR1 or orange for apo wild-type rat 


SUCNRI. For clarity, only the backbone of the humanized rat SUCNR1is shown in 
cartoon representation. h, Side chains that directly interact with NF-56-EJ40 via 
hydrogen bonding, m-m stacking and cation-m stacking are listed in black. The 
hydrogen-bonding interactions are shown by black dashed lines and the m-1mt and 
cation-t interactions are shown by green dashed lines. Additional residues with 
van der Waals interactions are listed in green, and their interaction surfaces are 
indicated by solid green lines. i, Top view of the humanized rat SUCNRI1 (blue) in 
complex with NF-56-EJ40 (green sticks) and of the apo rat SUCNRI1-derived 
model of human SUCNRI1 (red) with the binding mode of NF-56-EJ40 (pink) from 
molecular-docking studies, for which details are shown in Extended Data Fig. 7a. 
Note how both NF-56-EJ40 poses differ considerably, as indicated by red arrows. 
j, Detailed views of the NF-56-EJ40-binding site. For clarity, only side chains are 
shown. Humanized rat SUCNR1is shown in blue; wild-type rat SUCNR1is shown 
in orange. Note the side-chain flips for R95°>””, L98>”, H99>* and Y171 between 
both structures. k, Top view of wild-type rat SUCNRI1 structure (left), the apo rat 
SUCNRI1-based homology model of human SUCNR1 in complex with NF-56-EJ40 
(middle) and the humanized rat SUCNR1 structure in complex with NF-56-EJ40 
(right). The surface is shown coloured by electrostatic charge. The two key 
positions (K/E'* and K/N’””) are highlighted by arrows. NF-56-EJ40 is shown in 
yellow as a ball-and-stick model. Note the differences between the NF-56-EJ40- 
binding mode determined in the modelled structure and in the crystal structure, 
and between the surface charge distributions in rat, human and humanized rat 
SUCNRI. 
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Extended Data Table 1| Data collection and refinement statistics 


rat SUCNR1-Nanobody6 humanized rat SUCNR1-NF-56-EJ40 


(PDB 6IBB)! (PDB 6RNK)** 
Data collection 
Space group P2; C2 
Cell dimensions 
a, b,c (A) 60.22, 164, 63.42 76.83, 150.93, 68.23 
a, B,y (©) 90, 102.634, 90 90, 112.42, 90 
Resolution (A)* 82.00-2.103 (2.288-2.103) 75.47-1.949 (2.085-1.949) 
Ruerge* 0.302 (4.965) 0.105 (1.579) 
I/oI* 12.3 (1.4) 10.4 (1.3) 
Completeness* (%) 
spherical 59.4 (7.1) 77.6 (21.3) 
ellipsoidal 92.3 (64.3) 92.7 (57.0) 
Redundancy* 30.9 (33.5) 7.0 (7.4) 
Refinement 
Resolution (A) 41.55-2.12 75.47-1.94 
No. reflections 41256 40585 
Rwork / Ree 17.8 /21.6 V7 19.9 
No. atoms 
Protein 6643 3281 
Ligand/ion 374 286 
Water 299 242 
B-factors 
Protein 56.38 42.13 
Ligand/ion 77.95 69.17 
Water 61.65 61.90 
R.m.s. deviations 
Bond lengths (A) 0.014 0.014 
Bond angles (°) 1.70 1.56 


*Values in parentheses are for the highest-resolution shell. 
‘Data merged from 18 crystals. 
"Data obtained from a single crystal. 


Extended Data Table 2 | Data for the binding of radioligand 
[SH]NF-56-EJ40 with human and rat SUCNR1 mutants 


mutant Ki(nM)  pkKits.d. 
Human SUCNR1 17.4 7.76 + 0.06 
E18A n. a. n.a. 

E18R 41.6 7.43 + 0.26 
E18K 42.2 7.39+0.15 
E22A 67.5 7.18 +0.10 
E22K n. a. n.a. 

Y30F 74.9 7.144012 
S82A 14.2 7.84 + 0.06 
Y83F n. a. n.a. 

Y83A n. a. n.a. 

C95S n. a. n.a. 

R99A 16.1 7.81 +0.13 
H103A 37.3 7.54 + 0.38 
T170A 33.8 7.48 + 0.08 
T170K 23.9 7.64+0.13 
T171A 26.4 7.58 + 0.06 
T171K 21.4 7.68 +0.10 
C172S na. n. a. 
S177A 19.0 7.72 + 0.03 
Y248F 7.8 8.12+0.11 
Y248A na. n. a. 
R252A na. n. a. 
R255A 16.9 7.78 + 0.06 
N274A 14.4 7.86 +0.17 
N274K Zot 7.63 + 0.05 
Y277F 12.7 7.90+0.1 
Y277A na. n. a. 
R281A na. n. a. 

E22K, K274N na. n. a. 

Rat SUCNR1 

K18E, K269N ee) 7.48 + 0.09 


n.a., binding signal too low (<300 counts per minute) to determine K,. Values were determined 
from n=3 independent experiments. 
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For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
Lo AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 
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Policy information about availability of computer code 


Data collection Synchrotron data collection software interface at beamline XO6SA (PXI, SLS/PSI, Villigen, Switzerland), thermostability data were collected 
using NanoTemper PR.ThermControl version 2.1. Size exclusion chromatography data were collected using UNICORN version 6.3.2. and 
ChemStation Rev.B.04.03-SP1 on Agilent HPCL systems. 


Data analysis X-ray diffraction data were analyzed with XDS and XSCALE Version May 1, 2016 (BUILT 20160617) and AUTOPROC utilizing Pointless 
version 1.11.3, AIMLESS version 0.5.32, CCP4 version 7.0.0.44 and Staraniso version 1.9.6 (20170920). The structure was solved by MR in 
Phaser Version 2.8.0 as implemented in Phenix Version 1.15.2-3472 and the structure was refined in Buster Version 2.11.07. An initial 
structural model was built using AUTOBUILD/RESOLVE in Phenix Version 1.15.2-3472 and further model building was performed in COOT 
version 0.8.9.1. Structural figures were prepared with PyMol incentive 2.0.7. Biochemical assay data were analyzed in GraphPad Prism 
version 7.04 and Microsoft Excel 2016. Thermostability assay data were analyzed with NanoTemper PR.ThermControl Software version 
2.1 and GraphPad Prism version 7.04 and GraphPad Prism version 8.1.2. For homology modeling and molecular docking, Prime (version 
2018-3), Corina (version 4.2.0), blabber_sd from the MoKa package (version 2.6.6.) and Glide (version 2018-3) were used. Sequences 
were aligned with ClustalOmega (https://www.ebi.ac.uk/Tools/msa/clustalo/) and sequence alignment figures prepared using ESPript3.0. 
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We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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X-ray structure coordinates and structure factors have been deposited in the Protein Data Bank under accession codes 6IBB and 6RNK. There are no restrictions on 


data availability. 
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Randomization 


Blinding 


Reporting for specific materials, systems and methods 


No statistical methods were used to predetermine sample size. The determined sample size was adequate as the differences between 
experimental groups was reproducible, as indicated. X-ray diffraction data were collected until completeness of the data set. 


No data were excluded from the analysis. 


All attempts at replication of biochemical and signaling assays succeeded. The experimental findings were reproduced in independent 
experiments. The number of independent experiments and biological replicates in each data panel is indicated in the figure legends.. 


No randomization was attempted or needed. Randomization was not formerly performed in this study as it did not involve animals and/or 
human research participants. 


Authors were not blinded. No blinding was attempted or needed. Blinding is not relevant for protein structure determination or functional 
assays as the results are not subjective. 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
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Mycoplasma contamination Cell lines were tested for mycoplasma contamination and shown to be free from mycoplasma. 


Sf9 insect cells (Life Technologies), HEK293FT (Life Technologies), SUCNR1 CHO-K1 (Novartis), S1P1R CHO-K1 (Novartis), 
SUCNR1 Chem1 (Millipore) 
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Cell lines were maintained by the supplier. No additional authentication was performed by the authors of this study. 


Commonly misidentified lines No commonly misidentified cell lines were used. 
(See ICLAC register) 
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SHARED PARENTAL 
LEAVE FOR THE FAMILY 


Lessons learnt in balancing academia and 
early parenthoood. By Lynsey Bunnefeld 


y husband Nils and I work in the 

biological and environmental 

sciences department at the Univer- 

sity of Stirling, UK, and we had our 

first baby in May 2018. Before our 
son Euan was born, we decided to make use of 
the United Kingdom’s shared parental leave 
(SPL) policy. This scheme allows parents who 
meet certain eligibility criteria to share up to 
50 weeks of leave, of which 37 are paid, in their 
child’s first year of life. Our decision placed us 
among the 1% ofall eligible couples nationwide 
who actually take the leave. 

We had a loose plan: Nils would take two 
months’ leave when our son was ten months 
old, at which point I would go back to full- 
time work. Because this would be in the final 
three months of our allotted leave, which in 
the United Kingdom are unpaid, and because 
my husband earns more than I do, it would 


involve a bigger salary loss than if 1 took those 
months off. However, we were able to take the 
financial hit, and although we reasoned that 
it might be difficult because Nils manages a 
large research group (mine, focused on ecol- 
ogy, is muchsmaller), we decided that it would 
be worth it for the time he would get to spend 
with our baby. 

Fast-forward through the six life-chang- 
ing months following Euan’s birth, and it was 
becoming apparent that my mental health 
might benefit from my returning to work a 
little bit earlier than planned. Also, we were 
both concerned that Nils leaving his group to 


“We were both exhausted 
and sleepwalking through 
our lives at work and home.” 


© 2019 Springer Nature Limited. All rights reserved. 


manage itself for two months might be asking 
too much — so we changed our plan. Under the 
policy, shared leave can be discontinuous, so 
we decided to split up the final three months of 
leave. One of us would work one week, while the 
other took leave — and the next week, we would 
switch. To minimize disruption to our depart- 
ments, Nils committed to all of his teaching and 
administration during these three months. The 
leave was approved, and we were all set. Nils 
was excited about the time ‘off (I did try to tell 
him that a day witha baby is not really time off), 
and I was excited about activating parts of my 
brain that had been dormant for a while. 

This worked perfectly for some time, butin 
the third week of our ‘one week on—one week 
off cycle, trouble started to brew. Euan didn’t 
nap, so Nils’s Skype meeting with a collaborator 
couldn't happen. Students started knocking 
on my office door, asking why my husband 
hadn’t replied to their e-mails. Nils read man- 
uscripts in the evening once the baby was in 
bed. Assignment marking started to rollin. And 
things further unravelled from there. Although 
Nils continued to enjoy his time with Euan, he 
became increasingly anxious about work as 
he squeezed in e-mails and Skype calls when- 
ever he could. He was not able to fully switch 
off his work brain and completely engage with 
the baby. 

After a few more weeks, we adjusted our 
schedules so that we were both working part- 
time each week. This did alleviate some stress, 
and Nils stayed more on top of his responsi- 
bilities — but it was hard for me, having just 
returned to work, to get into any kind of 
rhythm. We were both exhausted and sleep- 
walking through our lives at work and home. 

Many countries don’t have policies similar 
to the United Kingdom’s SPL, but here is our 
advice to academic couples who are in a posi- 
tion to make use of such benefits in the United 
Kingdom and elsewhere. 


Top tips 

Make use of SPL. We affectionately call our SPL 
a‘car crash’, but for Nils, Euan and I, the crash 
was totally worth it. Nils has a much better 
appreciation of what a day with a baby is like, 
the two had lots of fun together and the baby 
is totally happy at home with either me or my 
husband. 

Actively put measures in place to ensure that 
the partner at home with the baby can be fully 
engaged with being at home. In hindsight, we 
agree that Nils should have made sure that all of 
his research students had alternative supervi- 
sion during his leave, and he should have more 
clearly communicated to his research group 
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and network of collaborators that he was taking 
time off. He should also have declined at least 
50% of requests for peer review and for help 
administering PhD-thesis defences. When the 
workplace does not support the parent on SPL 
fully, the other parent, usually mum, is also left 
unsupported and is unable to return to work 
as effectively. 

Fight for appropriate coverage at work 
while taking SPL. Nils was on (unpaid) leave 
for 6 weeks over 3 months, but we estimate 
that he was working for about 80% of his ‘nor- 
mal’ full-time hours rather than the 50% he 
was paid for during that period. Of course, this 
is partly due to his own conscientiousness — 
many scientists work far beyond their sala- 
ried hours, especially ifthey have a group that 
they feel responsible for. You might think that 
cover is not needed for such a short period — 
however, it absolutely is. That 6 weeks (or 30 
days) of work needed to be done by someone. 
Our institution’s policy is that time should be 
split 40:40:20 between research, teaching and 
admin. Assuming cover is required only for 
teaching and admin, we needed assistance 
for 18 days, either from someone inside the 


“Don't betoohardon 
yourselfin the first few 
months with the baby.” 


institution or from an external short-term 
contract worker. We suggest approaching 
human resources, the head of your depart- 
ment or your institution’s equality and diver- 
sity committee before your leave begins in 
order to request this assistance. 

We can’t be sure, but we think that one period 
of continuous leave might have helped mat- 
ters. Our colleagues and Nils’s research group 
might have found support elsewhere when they 
needed it, and it might have been easier for him 
to really switch off and be a stay-at-home dad 
fora short while. 

Don’t be too hard on yourself in the first few 
months with the baby. And once you're back at 
work, it takes a while to catch up with research 
— so enjoy having that time to think about 
non-baby subjects, get up to speed with new 
research and spend time with your colleagues. 
If you've implemented the tips above, youcan 
be relaxed in the knowledge that your baby is 
at home having a ball. 

Euan, now almost one and a half years old, 
isin anursery three days per week, Nils is back 
to full-time work and 1am adjusting to working 
part-time. Really, the challenge of navigating 
this newnormalisjust beginning. We'll not have 
achance to relive our baby’s first year, so we’re 
hopeful that this post and our advice will help 
other new parents to get the most out of SPL. 


Lynsey Bunnefeld is a lecturer in ecology and 
evolution at the University of Stirling, UK. 


588 | Nature | Vol 574 | 24 October 2019 


YOU ARE NOT 


ANIMPOSTOR 


Ways to control the voice in your head that insists 
you're not good enough. By Desiree Dickerson 


sI sit down to write this piece, a voice 

in my head tells me: “You can’t do 

this,” and “Who do you think you are?” 

Tension grows. Writing about well-be- 

ing starts to stress me out. “This needs 
to be perfect,’ the voice continues. 

This voice is not unique to me; we all have 
one. Itis a product of our beliefs and our mind- 
set. It influences how we perceive the world, 
our position in it and howwethink, feel, act and 
interact. 

It has driven many of us to academic acco- 
lades and career advancement — bothmeasures 
of success according to most social standards. 

But for some of us, this voice can denounce us 
as ‘impostors’ inacademia and demand that we 
work twice as hard. Gradually, every day begins 
to feel like the morning of an examination. New 
ideas are dismissed with negative thoughts such 
as: “If | thought it, then it must be obvious.” We 
read and reread to see how others have said 
what we want to say, because surely they said 
it better and more clearly. We silence our curi- 
osity and don’t speak up in lectures or meetings, 
missing invaluable learning opportunities. 

The pursuit of excellence might have driven 
us to get high marks at university, but this per- 
fectionism has become so ingrained that it fuels 
our need to forfeit rest as we work through the 
weekend. It underlies our tendency to amplify 
the criticism over the praise. We drag out dead- 
lines as we search for something ‘better’ or 
‘more perfect’. Academia might benefit from 
this imbalance, but often our health as scientists 
does not. 

Looking back, Ican see that this voice playeda 
large part in my departure from academia. Now 
that I run well-being and resilience workshops 
for academic institutions across Europe, and 
work one-to-one with academics as anacadem- 
ic-resilience coach, | knowl am not alone. 

After leaving academia, I decided to apply 
my skills as aclinical psychologist to change the 
narrative. First, | needed to dial down the fear 
and self-doubt that were so easily evokedin me. 

To do that, I had to recognize the voice for 
what it was — a negative influence that I was 
allowing to make big life choices for me. I had 
to challenge the internal dialogue telling me 
I wasn’t good enough, and to equip my new 
voice with arguments that recognized my 
strengths rather than magnified my fears. I real- 
ized that I had to develop a voice that could be 
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compassionate in the face of setbacks — that 
wouldtalk to meas! would talk toa good friend. 

And, crucially, | needed to challenge the 
behaviours — avoidance, procrastination — that 
were empowering that voice and maintaining 
the cycle of self-doubt. These behaviours, of 
course, made me think that my old voice was 
right, that “I clearly wasn’t good enough.” 

Writing anewscript for the voice in my head 
is an ongoing process. I can’t say I've killed off 
the character entirely, but it no longer plays 
the lead part. To complement the cognitive 
behavioural techniques that I used to rewrite 
my voice, certain specific, learnable exercises 
have helped me to gain more control. 

Istarted practising mindfulness meditation 
to gain more control over where and how often 
my mind wanders. It helps me to be less emo- 
tionally reactive to things like criticism and 
feedback, less preoccupied by the progress of 
others and better able to focus on what I want 
to bring to the table. If you're interested, Mark 
Williams and Danny Penman’s Mindfulness: An 
Eight-Week Plan for Finding Peace in a Frantic 
World (2011) was a good starting point for me. 

Irestructured my day to prioritize activities 
that make me most productive. I rate my sleep 
above all things and I exercise, no matter the 
deadlines, because I know it helps me to man- 
age stress better, think more clearly and focus 
for longer (and it just makes me a much nicer 
human — to myself and to others). 

By muting parts of that inner voice — the ones 
centred on perfection, worry, fear and guilt — 
you too can create space. Mental space and 
energy can be freed up to think, create, be pres- 
ent, ask questions, learn and relax. Imagine your 
life without that weight, without that constant 
pre-exam tension. Imagine academia withoutit. 


Desiree Dickerson is a neuroscientist 
and a clinical psychologist 
(www.desireedickerson.com). 
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spend a lot of time with jet engines as 
part of my PhD, trying toimprove them. 
This one is a Concorde engine — it’sa 
demonstration model that’s used to show 
where the various components of the 
engine are located, and it sits on the ground 
floor of the institute where I work. 

Jet-engine turbines can reach temperatures 
of 2,000 °C. I’m exploring ways to use less air 
to cool them and to reduce carbon dioxide 
emissions. I test the performance of films 
that coat different engine-blade designs ina 
one-metre-long research tunnel at one of the 
20 research facilities at the institute. 

AsaPhD student, | get limited time with the 
rig, whichI use to test whether our models are 
accurate. A mock turbine is wired with small, 
flexible tubes to monitor pressure as well as 
with tiny temperature sensors. 

The set-up leaves just enough space to sit at 
acomputer table to record data. 

The machine is deafening and 1 often do 
experiments that involve ultraviolet light. 
Everyone has to wear ear protection and 
goggles. To ensure that my experiments run 
smoothly, I prefer to process and analyse data 
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early inthe morning, late in the evening or at 
weekends, when colleagues are largely absent. 
I’ve found that the small breakthroughs 
always come over the weekend. 

Since I left Kenya to start my PhD four years 
ago, I’ve relied ona strong support system 
during tough times. Some personal rituals 
help me to focus. I play high-energy dance 
music — including Bongo, or Swahili hip hop. 

Getting my brain into a rhythm, just like 
when!’mrunning, helps me to relax and solve 
problems when something is not working. 

lalso like to keep my work area simple and 
clutter-free to avoid distractions. I don’t have 
any posters or any other personal touches. It’s 
just me, my tunes and the rig. 

Oddly, as my experiments wind down, I 
feel like I've developed Stockholm syndrome. 
I’ve spent so muchtime here, yet I realize 
that I’ve formed some kind of attachment to 
this place — and know that when I ultimately 
leave, I’ll miss it. 


Gladys Ngetich is a PhD student at the Oxford 
Thermofluids Institute at the University of 
Oxford, UK. Interview by Virginia Gewin. 
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ttakes time to build research institutions of quality and substance, 
but getting the right components together at the outset enhances 
the chances of success. The most successful among the Nature Index 
Young universities, defined as being aged SO years or younger, are 
making remarkable headway in attracting talent and rising up the 
ranks through high-quality research outputs and collaborations. 

The leaders of these higher-education institutions often cite similar 
reasons for their success. Many feel liberated from the traditions that 
characterize older institutions, and they list strong interdisciplinary 
cultures, a track record of innovation and the capacity to attract amore 
diverse student population. 

The proliferation of new universities in the 1950s and 1960s has 
had remarkable effects on countries such as South Korea, where an 
explosion of higher-education opportunities has seen the proportion 
of 25-34-year-olds with a tertiary education surge from less than 2% 
at the time of independence in 1945 to 70% in 2017, among the highest 
rates worldwide. China and Singapore have also benefited from their 
commitment to revamping the research and education landscape. 

The institutions featured in this supplement are outstanding perform- 
ers in terms of the Nature Index metrics of article count (AC) and frac- 
tional count (FC). The first (AC) is the total number of articles published 
by aninstitution’s affiliated authors in the 82 publications tracked by the 
Nature Index. FC measures the share of those institutions’ contribution 
to each article. 

The institutions pride themselves on promoting creative thinking, and 
offering leadership opportunities to young- and mid-career researchers 
who are encouraged to pursue unconventional research that sparks 
invention. As Christopher Barner-Kowollik, amacromolecular chemist 
at Queensland University of Technology in Australia, puts it: “Innovation 
occurs at the flanks of research, not the mainstream.” 

And yet, just as industry start-ups often struggle to make it past the 
first five years, young universities have an uphill battle ahead of them 
once the initial cash injections to establish them cease. Only those that 
have built reputations to rival their older peers will survive. 


Bec Crew 
Senior editor 
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Grégoire Courtine and Léonie Asboth, whose studies on rats at EPFL led to an implantable device to help paraplegic patients. 
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FAST LANE TO 
THE FUTURE 


Unconstrained by centuries of convention, in 
the race for solutions these high performers are 
setting the pace in diverse research fields. 


SWISS FEDERAL INSTITUTE 
OF TECHNOLOGY LAUSANNE 


2018 FC: 219.92 | AC: 542 


Faculty: 4,700* | Students: 11,134 


PhD graduates: 2,216 


An implantable device that has restored the 
ability of three patients with paraplegia to 
walk is one of the most promising innovations 
in development at the Swiss Federal Institute 
of Technology Lausanne, Switzerland (EPFL). 

The wireless implant, made up of an array of 
electrodes stretched over the spinal cord, targets 
individual muscle groups inthe legs to mimicthe 
signals fired in the brain when walking. 

Volunteers in the clinical trial, David Mzee, 
Gert-Jan Oskam and Sebastian Tobler, have 
endured months of training and physical therapy 
to regain voluntary control over their leg mus- 
cles after several years of paralysis. They are now 
able to walk with the aid of crutches or a walker. 

The study, published in two papers last year 
in Nature and Nature Neuroscience, is led by 
Grégoire Courtine, a neuroscientist at EPFL’s 
Brain Mind Institute, and Jocelyne Bloch, a 
neurosurgeon at the Lausanne University Hos- 
pital (CHUV). The researchers saw continuous 
improvements in the patients’ motor function, 
even after the device was switched off. 

The research follows a study published in 
early 2018, led by Courtine’s colleague, Léonie 
Asboth, which produced similar results in para- 
lysed rats. The team observed for the first time 
how the brain can reroute motor commands 
through alternative pathways to the spinal cord. 

“During a thesis, we all wonder at some 
point if what we’re doing is going to have an 
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impact,” says Asboth. “Being able to see the first 
implications of this research on patients with 
spinal cord injury was very rewarding for all 
of us.” 

An EPFL spin-off company, GTX medical, 
co-founded by Courtine and Bloch, is now 
developing the technology for use in hospitals 
and clinics. EPFL is exclusively licensing the pat- 
ents to GTX medical, while also hosting many 
of its 40 researchers and clinicians. 

Known formerly as the Ecole polytechnique 
de l'Université de Lausanne, EPFL was estab- 
lished in 1969 as a university in its own right 
following a decision by the Swiss parliament to 
create a second federal institute of technology 
in addition to ETH Zurich. Itis the third-highest 
ranked young university in the Nature Index 
and Switzerland’s only representative among 
the leading 100 young universities. Bec Crew 
*includes technical staff 


SHANGHAITECH UNIVERSITY 


2018 FC: 36.34 | AC: 164 


Faculty: 521 | Students: 3,165 


PhD graduates: 41 


China’s ShanghaiTech University’s high-quality 
research output has grown rapidly since its 
foundation in 2013. Itis the world’s fourth-fast- 
est rising young university and is ranked 22nd 
among the Nature Index Young universities. 

Ning Zhijun, assistant professor in the School 
of Physical Science and Technology, says he was 
attracted to ShanghaiTech’s systematic mate- 
rials science research, through which he and 
his team have produced high-profile papers in 
Nature, Nano Letters and ACS Nano. 
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Hye Young Cho (left) and Young Shin Yoo at UNIST focus on new cells that are safer than conventional lithium-ion batteries. 


“Unlike established institutions, Shanghai- 
Tech as anewuniversity does not impose heavy 
publication pressure on young scientists,” 
he says. “Our university offers generous 
research funding, so we can take on original 
studies that might not be published in the 
short term.” 

Zhong Chao, a nanomaterials scientist in the 
physical science school, says that Shanghai- 
Tech differs from older Chinese universities 
because its young scientists are not required 
to join teams of senior scientists when they 
are first recruited. This means they are free 
to explore their own research interests with 
fresh eyes. 

“In anew university like ShanghaiTech, there 
aren’t so many established figures, so young 
scientists canindependently take onsome risky 
research, whichis potentially more innovative,” 
says Zhong. Peer collaboration is also easy, he 
says, unencumbered by the need to consider 
seniority. 

ShanghaiTech was jointly launched by the 
Shanghai municipal government and Chinese 
Academy of Sciences (CAS). Asis the case with 
other young universities, it was not immedi- 
ately permitted by China’s Ministry of Educa- 
tion (MOE) to recruit doctoral students in its 
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ownname, but CAS as a co-founder has helped 
it overcome this difficulty. 

“CAS recruited doctoral students for us in 
the name of its Shanghai branch, and then 
transferred these students to us,” says Zhong. 
In 2018, the MOE allowed the university to inde- 
pendently enrol its first doctoral students in 
materials science and engineering. Hepeng Jia 


ULSAN NATIONAL INSTITUTE 
OF SCIENCE AND TECHNOLOGY 


2018 FC: 68.88 | AC: 161 
Faculty: 322 | Students: 5,272 


PhD graduates: 116 


Sang-II Seok has a vision for his lab’s perovskite 
solar cells: covering the decks of a crude-oil 
tanker and supplying clean power to the vessels 
that haul the dirtiest of fuels. It’s ajuxtaposition 
familiar to his university, the Ulsan National 
Institute of Science and Technology (UNIST), 
which is the tenth-highest ranked young uni- 
versity inthe Nature Index. 

The city of Ulsan is known as an industrial 
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hub of South Korea, but since its establish- 
ment in 2009, UNIST has earned a reputation 
for clean energy research, including batteries 
and solar cells. 

“UNIST has become one of the strongest 
places to study green technology in Korea,” 
says Seok, a materials scientist whose research 
group specializes in solar cells that use a perov- 
skite compound as the light-absorbing layer, 
whichis easier to fabricate than more common 
materials such as silicon. 

In 2017, Seok and his colleagues set 
the benchmark in perovskite-solar-cell effi- 
ciency of 22.1%. That benchmark has since been 
raised to 25.2%, just behind silicon at 27.6%. 

Seok’s early success at UNIST was bolstered 
by funding for his lab of more than one billion 
South Korean won (US$826,000) from the 
university, part of an initial investment in its 
new campus. 

UNIST’s focus on building a workforce skilled 
innew energy technologies is more important 
tothe local community than ever, as the Korean 
shipbuilding industry, which was booming 
when the institution was first established, has 
suffered a significant downturn in recent years. 
“We need to educate the students who can get 
these good jobs in the near future,” says Seok. 


UNIST 


Established to challenge South Korea’s top 
technical universities, UNIST mandated that all 
courses are to be taught in English to boost its 
international competitiveness. This is crucial 
for researchers in its Fluidics and Reactions 
Using Integrative Technology and Science 
(FRUITS) Lab, for example, who collaborate 
with EPFL in Switzerland, the National Uni- 
versity of Singapore, and several teams in the 
United States on cell-to-cell communication, 
lab-on-a-chip technology, and nanodevices for 
use in medical research. 

“There is no borderline in science,” says 
FRUITS Lab group leader, Yoon-Kyoung Cho. 
“Many of my students have learned self-con- 
fidence through active international collab- 
orations. We have all our meetings in English, 
which was not easy in the beginning, but it 
becomes so natural that the students ask good 
questions at big international conferences, 
which makes me proud.” Mark Zastrow 


SHENZHEN UNIVERSITY 


2018 FC: 52.48 | AC: 179 
Faculty: 3,647 | Students: 34,156 


PhD graduates: 26 


Following Shenzhen University’s (SZU) estab- 
lishment in 1983, two of China’s top universi- 
ties, Peking and Tsinghua, seconded teaching 
staff to the fledgling institution. The move 
was to support the city of Shenzhen’s develop- 
ment as one of four ‘special economic zones’ 
in southeastern coastal China, which were 
created in 1980 to attract foreign investment 
and technology. 

High-profile alumni who have since cut 
their teeth at SZU include computer scientist, 
Ma Huateng, whois the founder and chief exec- 
utive of the Chinese social media behemoth, 
Tencent, and software engineer, Shi Yuzhu, 
who set up the online gaming company, Giant 
Interactive Group. Alumna Tu Hongyan, now 
chairperson of Hangzhou-based silk brand, 
Wensli, was named one of Forbes’ top Chinese 
women in business in 2018. 

Today, one of SZU’s most highly cited sci- 
entists is Zhang Han, a professor of optics 
and photonics. Han uses graphene and other 
two-dimensional materials to create laser 
photonics devices, which have applications in 
fields such as medicine, communications and 
quantum information science. 

“Our research received heavy investment 
from the Shenzhen municipal government, 
which considers new materials as one of its 
priority high-tech industries,’ Han told Nature 
Index. 


Han joined SZU in 2013 as a ‘young thou- 
sand-talent’, part of China’s Thousand-Talent 
scheme, launched in 2008 to attract leading 
scholars. He says SZU’s advantages over more 
established universities in China include the 
encouragement it offers to young scientists 
who are keen to pursue new research areas, 
and its strong support for international 
collaboration. 


“It was rather mundane 
biology. wanted amore 
meaningful experiment.” 


The university has partnerships with 256 uni- 
versities overseas for collaborative research 
and student training. It also has strong links 
to local industry due to Shenzhen’s status as 
a high-tech hub. 

Last year the university's total research 
budget exceeded 1.1 billion yuan (US$153 mil- 
lion), up from 100 million yuan in 2013, and it 
received 302 grants from the National Natural 
Science Foundation of China. 

SZU is the third-fastest rising young 
university in the world and is ranked 13th 
among the Nature index Young Universities. It 
is also the third-fastest rising young university 
in the fields of chemistry and physical sciences. 
Hepeng Jia 


DAEGU GYEONGBUK INSTITUTE 
OF SCIENCE AND TECHNOLOGY 


2018 FC: 19.99 | AC: 53 
Faculty: 260 | Students: 1,449 


PhD graduates: 26 


When robotics engineer, Hongsoo Choi, visits 
his neuroscience collaborators to work on their 
medical microrobots, he doesn’t even have to 
walk outside. All six departments of the Daegu 
Gyeongbuk Institute of Science and Technol- 
ogy (DGIST) are housed within a tight cluster 
of buildings constructed in 2010. “It helps alot, 
actually,” he says. “I canjust stop by for discus- 
sions if I need. That’s a big advantage.” 

The layout reflects one of the South Korean 
university’s core principles: ‘convergence’, its 
preferred term for an interdisciplinary mindset 
in research and study. 

In May 2019, the team of roboticists, engi- 
neers and neuroscientists reported that they 
had created sphere- and helix-shaped micro- 
robots that can deliver transplanted stem cells 
inside a live mouse. When a magnetic field is 


© 2019 Springer Nature Limited. All rights reserved. 


applied, the bots can roll along the walls of 
blood vessels or swim through fluids, carrying 
stem cells to their target. 

In proof-of-concept experiments six years 
earlier, Choi guided microbots as they swam 
around plastic containers carrying kidney cell 
cultures. It was remarkable robotics, but rather 
mundane biology. “I wanted amore meaningful 
experiment,” he says. 

This led him to the neuroscientists next door, 
who could culture neural stem cells from the 
hippocampi of mice, part of the brain involved 
with memory, learning and emotion. Thestudy, 
published in Science Robotics, reports that the 
microbots could carry these tiny payloads 
until they differentiated into several types of 
brain cells. They also navigated the arteries of 
adead rat’s brain, demonstrating the potential 
to deliver therapeutic cells to targeted areas to 
potentially restore brain functionality. 

Choi’s team involves a number of young 
researchers with diverse expertise in robot- 
ics, such as Junhee Choi, who in a separate 
project helped to develop an ultrasonic device 
for root-canal treatment, and PhD candidate, 
Eunhee Kim, who focuses on the adhesion 
properties of microrobots in regenerative 
medicine. 

DGIST, which was established just 15 years 
ago, actively encourages cross-disciplinary 
projects, as they havea better chance at being 
awarded internal grants, says Choi. “It’s nota 
rule, but it’s the culture.” The university is the 
seventh-fastest rising young university and is 
ranked 50th among the top young universities 
in the Nature Index. Mark Zastrow 


OREGON HEALTH 
AND SCIENCE UNIVERSITY 


2018 FC: 55.48 | AC: 151 
Faculty: 2,900 | Students: 4,706 


PhD graduates: 28 


Brian Druker, director of the Oregon Health 
and Science University’s (OHSU) Knight Can- 
cer Institute in Portland, in the United States, 
is developing targeted treatments for acute 
myeloid leukaemia (AML), the most common 
form of blood cancer in the United States. This 
highly lethal disease affects the myeloid cells 
in bone marrow and kills more than 10,000 
people in the United States annually. 

Over the past 40 years, progress towards 
new, more effective treatments for AML has 
been slow. “There have been few changes in the 
way this form of leukaemia has been treated,” 
says Druker. 
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QUT nanotechnologist, Jennifer MacLeod, is studying how to grow and modify two-dimensional materials. 


Two decades ago, Druker was involved in 
the development of the first targeted drug 
for chronic myelogenous leukaemia, a slow- 
growing form of leukaemia. The treatment, 
marketed as Gleevec, transformed the disease 
from a life-threatening illness to a managea- 
ble condition, with 90% of patients living for 
at least five years after diagnosis. 

In 2016, Druker and his colleagues estab- 
lished the Beat AML programme, a long-term 
collaborative clinical trial that aims to uncover 
targeted treatments for various forms of acute 
myeloid leukaemia. 

As part of this work, the team generated 
the largest data set of its kind, drawn from 
672 tumour samples from 562 patients. The 
findings, published in Nature in October 2018, 
will help researchers pinpoint which genetic 
markers are sensitive or resistant to treatment, 
to inform future clinical trials. 

OHSU, the former University of Oregon 
Health Center, became independent of the 
university in 1974, and was renamed in 1981. It 
isnowthe largest employer in Portland, and the 
only academic health-care centre in the state, 
comprising three campuses, two hospitals and 
numerous clinics. 

OHSU is the most prolific young university 
in life sciences research output in the Nature 
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Index, and is ranked 11th among young univer- 
sities in the Nature Index. 

Working at a relatively young institution 
has enabled Druker to “move quickly and get 
things done without having to navigate layers 
of bureaucracy”. Having the freedom to think 
outside the box is another advantage. “Being 
willing to embrace ideas that might be con- 
sidered outside the mainstream has allowed 
us to develop new and paradigm-changing 
research,” says Druker. Gemma Conroy 


QUEENSLAND UNIVERSITY 
OF TECHNOLOGY 


2018 FC: 27.85 | AC: 117 


Faculty: 2,110 | Students: 47,592 


PhD graduates: 327 


Asynthetic material that can shift its structure 
under different light conditions has been devel- 
oped by macromolecular chemist, Christopher 
Barner-Kowollik, at the Queensland University 
of Technology (QUT) in Australia. 

When the material is exposed to green LED 
light, its chemical bonds strengthen to produce 
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ahard, stable structure. In darkness, the mate- 
rial transforms into a soft, liquefied mass. Not 
only is the material reprogrammable, but it’s 
inexpensive to produce, consisting of just two 
chemical compounds. One of them, naphtha- 
lene, isan active ingredient in moth repellents. 

This light-stabilized dynamic material could 
be used as a 3D-printing ink for creating tem- 
porary scaffolds that support free-hanging 
structures, which are notoriously difficult to 
print using current methods. 

Inan effort to advance understanding of the 
material, Barner-Kowollik and his team pub- 
lished in the Journal of the American Chemical 
Society, inJune 2019, rather than patentit, inthe 
hope that other teams will explore its potential. 
They are part of an international collaboration 
involving Ghent University in Belgium and 
Karlsruhe Institute of Technology, Germany. 

The QUT researchers are based in the univer- 
sity’s science and engineering faculty, whichis 
oneofthelargestuniversityfacultiesin Australia. 
They work alongside nanotechnologist, 
Jennifer Macleod, who uses scanning probe 
microscopy and X-ray photoelectron spectros- 
copy to investigate how to grow and modify 
two-dimensional materials, such as graphene. 

QUT was established just 30 years ago, after 
operating for 20 years as the Queensland 
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Institute of Technology. It’s the sixth-fastest 
rising young university, as tracked by the 
Nature Index, and is ranked 30th of the young 
universities in the index. 

Whether it’s designing a new laboratory 
space or finding new collaborators, Barner- 
Kowollik says he’s able to effect change quickly 
at QUT, and has been given the freedom to 
explore new kinds of research questions. 
“Innovation occurs at the flanks of research, 
not within the mainstream,” he says. 

QUT has two campuses in Brisbane, and 
in 2013, it opened the Science and Engineer- 
ing Centre at its Gardens Point campus. In 
addition to teaching spaces and educational 
facilities, the centre houses the Institute for 
Future Environments, which brings together 
more than 300 scholars from different fields 
to collaborate on large-scale projects relating 
to natural, built and digital environments. 
Gemma Conroy 


UNIVERSITY OF PARIS-SUD 


2018 FC: 71.08 | AC: 574 
Faculty: 4,300 | Students: 31,800 
PhD graduates: 688 


Francois Costard, ageomorphologist at the 
University of Paris-Sud, is at the forefront of 
work investigating surface features on Mars for 
historic evidence of oceans. His latest paper, 
published earlier this year in the Journal of 
Geophysical Research: Planets, suggests that 
the Lomonosov crater in the planet’s north 
could have been the source of a mega-tsunami 
three billion years ago. 

The scenario involves an asteroid collision 
of similar impact to the one that wiped out the 
non-avian dinosaurs on Earth 66 million years 
ago. In the case of Mars, it’s thought that the 
asteroid slammed into a shallow ocean, causing 
a massive wave to form. The research provides 
evidence that liquid water could have persisted 
on Mars for millions of years. 

“This has implications for the total inven- 
tory of water on Mars, howit evolved, and the 
potential for the origin and survival of life on 
the red planet,” says Costard, director of the 
planetary geomorphology team at the Uni- 
versity of Paris-Sud and director of research 
at France’s National Center for Scientific 
Research (CNRS). 

He notes the advantages of research at 
a young institution like the University of 
Paris-Sud: “The youth of our institution favours 
the possibility of us having young research sci- 
entists and financial support for new, especially 
interdisciplinary, programmes.” 


A true alien landscape, these surreal sand 
dunes were photographed by the Mars 
Reconnaissance Orbiter near the red 
planet's north pole. 


Originally part of the University of Paris, 
Paris-Sud was established as a university in its 
own right in 1970, and now has several cam- 
puses in the southern suburbs of Paris, includ- 
ing its main campus in Orsay. It is ranked ninth 
in the young universities in the Nature Index, 
and its highest subject rank is in the physical 
sciences, where it is also placed ninth. 

In 2014, the University of Paris-Sud was a 
founding member of the University of Paris- 
Saclay, a‘mega-university’ that brings together 
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19 universities, colleges and research centresin 
the south of Ile-de-France. By 2020, when the 
University of Paris-Sud will be officially inte- 
grated into the University of Paris-Saclay, the 
consolidated institution will represent 15% of 
France’s total research output. Bec Crew 


HONG KONG UNIVERSITY 
OF SCIENCE AND TECHNOLOGY 


2018 FC: 108.39 | AC: 310 
Faculty: 680 | Students: 11,205 
PhD graduates: 273 


Despite being a relatively small institution, the 
28-year-old Hong Kong University of Science 
and Technology (HKUST) has been consistently 
ranked among the world’s top young universi- 
ties, noted for its growth and strong reputation. 

Inthe Nature Index, HKUST is the fifth-high- 
est ranked young university. It was ranked 32nd 
inthe QS World University Rankings 2020, pub- 
lished by UK education company, Quacquarelli 
Symonds, and came second in its 50 Univer- 
sities Under 50 ranking. It took the top slot in 
the 2019 Times Higher Education Young Uni- 
versities Ranking for the second year running. 

In addition toits traditional areas of strength 
— computer science, quantum physics and 
medical sciences — HKUST has achieved signifi- 
cant progress in molecular neuroscience, where 
researchers are investigating the proteins that 
promote the development of neurons and the 
mechanisms underlying neurodegenerative 
diseases, such as Alzheimer’s. 

HKUST’s vice-president of research and 
development, Nancy Ip, says that the univer- 
sity’s flexible and efficient decision-making and 
its ability to define its own traditions and swiftly 
adapt to challenges have brought international 
recognition. She says the university's location 
is also anadvantage, facilitating collaboration 
with international and Chinese universities. 

The career path of Qian Zhang, a computer 
scientist who joined HKUST in 2005 from 
Microsoft Research Asia, isan example of such 
collaboration. She has connected with a num- 
ber of industrial partners, including Microsoft 
and Intel, to develop newwireless connection 
technologies. In 2009, she established a joint 
lab with Chinese telecoms giant, Huawei. 

Zhang is the inventor of more than 50 
granted and 20 pending international patents, 
and in 2016 became the university’s youngest 
endowed chair professor. “Compared with 
other more established universities in Hong 
Kong, the key to HKUST’s success is its open- 
ness,” she says. Hepeng Jia 
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Newuniversities with broad 
focus are spreading knowledge 
globally. By Philip G. Altbach 


n the past half-century, the higher-education sector 

has mirrored the patterns of many luxury retailers: it 

has embraced massification and extended its product 

range to a wider market through the proliferation of 

new participants. The number of new academic institu- 
tions established worldwide in recent decades is unknown 
but undoubtedly runs into many hundreds. 

The vast majority of these new universities would not 
appear in the Nature Index, which measures high-quality 
research outputs, orinthe Academic Rankings of World Uni- 
versities (Shanghai rankings) or Times Higher Education's 
World University Rankings. This is because most young 
universities are local institutions focused on teaching young 
people, many of whomare the first in their families to attend 
a post-secondary institution. 

Indeed, the rankings, including Nature Index’s, over- 
emphasize the elite research-intensive sector of post-sec- 
ondary education. Teaching excellence is undervalued 
or ignored, in part because measuring quality is difficult. 
Research productivity has traditionally been seen as most 
prestigious. 

Since 2000, the number of students in higher education 
has more than doubled, exceeding 210 million globally, with 
the majority of this expansion taking place in developing 
and middle-income countries. Where governments have 
been slow or unable to invest in expansion, the private sec- 
tor has taken over, bringing tremendous variation to young 
universities. Private universities now represent the fastest 
growing segment of post-secondary education worldwide. 

Most of the young universities profiled by Nature Index 
for their impressive research performance are public insti- 
tutions in Asia, Australia and Europe. It is significant that 
none are in Africa or Latin America, where higher-educa- 
tion investment and quality has lagged behind the rest of 
the world. In North America, the new universities that have 
achieved excellence are now not so young, having mostly 
been established in the expansion period of the 1960s. Insti- 
tutions suchas the University of California, San Diego, the 
State University of New York at Stony Brook and the Univer- 
sity of Waterloo in Canada are good examples. 

The Nature Index’s interest, of course, lies with STEM- 
focused institutions, and it is worth noting that recent major 
investments in higher education have been made in univer- 
sities that are strong in science and technology. Publishing 
in STEM fields typically yields more citations, so raises the 
global visibility of both the researchers and their universi- 
ties. But some excellent young universities that focus more 
broadly have been established in the past two decades. 
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| InIndia, key examples are the Shiv Nadar University and 


The number of students in higher education has more than doubled since 2000. 


O. P. Jindal Global University, both near Delhi, and Azim 
Premji University in Bengaluru, which focuses on research 
and training in education. These private non-profit univer- 
sities, founded by philanthropists with deep pockets, are 
pioneers in their governance, curriculum and orientation 
compared with other Indian universities. 

In the United States, the Olin College of Engineering, 
established in 1997 with a US$460-million grant from the 
Olin Foundation, aims to revolutionize undergraduate engi- 
neering education by crossing disciplinary boundaries and 
eliminating lectures. Jacobs University in Bremen, Germany, 
is another ‘new model’ university, functioning entirely in 
English, with a majority of international students. 

These dynamic institutions illustrate a growing trend 
around the world, perhaps harking back to the early twen- 
tieth century in America, where the Rockefellers (Univer- 
sity of Chicago) and Stanfords (Stanford University) were 
bankrolling new universities with new ideas. 

The most successful young universities, whether public 
or private, are characterized by significant investments, 
innovative ideas about governance, curriculum and 
social responsibilities, and forward-thinking leadership. 
Whether all will succeed in the long run, as practices become 
entrenched and funding may dwindle, is unclear, but these 
impressive young institutions are beacons for the future of 

| higher education. 
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Putting youth onthe map 


Global research ranks are reordered when 


visualized according to the output of 
young universities in the Nature Index. 
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Each square represents a fractional count (FC) 
rounded to five. Fractional count measures the 
share of authorship of each article. Data are for 
2018. Colour represents the young universities’ 
contribution to each country/region’s total 
research output in the Nature Index. 
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THE AMERICAS 


EUROPE 


United States 
221.29 (1%) 


The United States has 

the highest output in the 

Nature Index overall, but 

the contribution of its young 
universities is relatively low. 
The Oregon Health and Science 
University (11th), the University 
of Texas at Dallas (17th) and 

the University of Alabama at 
Birmingham (25th) are its best- 
performing young universities 
for high-quality research output 
in the natural sciences. 


Brazil 
29.71 (10%) 


i Canada 
QQ 17.33 (1%) 


| Chile 
| 12.06 (12%) 


Germany 240.20 (5%) 


Germany has 11 of the top 100 
young universities in the Nature 
Index, the same number as 
China, but the collective output 
of Germany’s young universities 
is less than one-third of their 
Chinese peers. 


Switzerland 227.90 (16%) 
EPFL is the only Swiss university 


among the top 100 young 
universities in the Nature Index. 


France 215.63 (10%) 
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Spain — 
165.24 (15%) 


Italy 94.20 (9%) 
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Norway 
51.95 (27%) 


Austria ——— 
51.92 (15%) 


United Kingdom 37.37 (1%) 
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Ireland 
17.20 (16%) 


Russia 
15.75 (3%) 


Greece 
12.21 (16%) 


Luxembourg 
12.02 (62%) 


Netherlands 
11.90 (1%) 


Czech Republic 
10.34 (5%) 


Denmark 
8.55 (2%) 


Finland 
6.73 (3%) 


Cyprus 
4.98 (54%) 


WORLD LEADERS 


China and Germany have 

the greatest number of 

young universities among 

the 100 young leaders in the 
Nature Index. Saudi Arabia 

and Singapore’s few young 
institutions contribute the most 
to their countries’ output. 


Chinall @@O OOOO OO United States 8 
Germanyli @©@@@@@@606006000 Spain 5 
Indiaio @@@OOOG08080080 France 5 
Australia9 @@@@0@00600 Italy 4 
SouthKoreas @ @@@@ OO ®@ Japan 4 


ASIA PACIFIC AND MIDDLE EAST 


Mainland China 
841.47 (8%) 


The University of Chinese 
Academy of Sciences is the 
highest ranked young university 
in the Nature Index, with more 
than three times the article 
count of the nearest competitor 
(NTU Singapore). The top four 
fastest-rising young universities 
in the Nature Index are in China. 


South Korea 
401.04 (30%) 


China has the highest fractional 
count (FC) of any country, 

as contributed by young 
universities. South Korea comes 
in second, with less than half of 
China’s FC. 
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en) ad BT) 
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Singapore 
245.69 (41%) 


Nanyang Technological 
University with an FC of 232.51, 
is ranked number two among 
young universities in the 

Nature Index. Singapore is 

one of several countries with 
two institutions in the young 
universities top 100, including 
Norway, Austria, Israel, Sweden, 
Brazil and Portugal. 


India 
244.56 (26%) 


The Homi Bhabha National 
Institute with an FC of 42.31 

is the highest-ranked young 
university from India at 16th 
place overall, and the second- 
highest young graduate 
university. IISER Pune is 
second among Indian young 
universities at 20th place and 
IISER Bhopal was third in 26th 
position overall. 
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Australia 
226.15 (18%) 


Curtin University is Australia’s 
highest ranked young 
university, at 23rd. Its closest 
competitor, the Queensland 
University of Technology, in 
30th place, is the sixth-fastest 
rising young university in the 
world in change in FC between 
2015 and 2018. 
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SaudiArabia 104.77 (80%) 


The King Abdullah University 
of Science and Technology 

at sixth is the highest ranked 
young graduate university in 
the index. It contributes a far 
greater proportion of total 
research output than its young 
counterparts in any other 
country. 


Japan 
85.46 (3%) 


ot 


57.97 (10%) 


il eal Taiwan 
iesi 32.41 (9%) 


Turkey 
12.30 (18%) 


Iran 
10.21 (9%) 


Thailand 
5.46 (14%) 


Qatar 
5.26 (44%) 


SOURCE: NATURE INDEX. 


Ranked second among Nature Index’s 
Young universities, Nanyang Technological 
University is moving at an accelerated 
pace. 


Nanyang Technological University 

(NTU) is one of Singapore's top research 
institutes, and in recent years has emerged 
as a global leader in driving the ‘fourth 
industrial revolution’, a period defined 

by disruptive technologies such as the 
Internet of Things, robotics, virtual reality 
and artificial intelligence. 

Established in 1991, and now the 
second-most prolific young university in 
the Nature Index, NTU has climbed the 
global rankings in research output and 
reputation. 

Nature Index spoke with its president, 
Subra Suresh. 


How does NTU seek to engage with 
industry? 

NTU is engaging with some of the top 
industries from around the world. British 
jet-engine manufacturer, Rolls-Royce, 
for example, has partnerships with 29 
universities globally and their largest 
partnership is with NTU. This year we 
renewed a five-year contract with them, 
worth $88 million Singapore dollars 
(US$63.5 million) to look at next- 
generation aircraft engines, 3D printing, 
digital manufacturing and many other 
topics. 

Chinese retail and e-commerce 
company, Alibaba, established a joint Al 
research institute with NTU on our campus. 
It involves 25 Alibaba employees working 
with 25 professors here. And last year, 
American software company, Hewlett- 
Packard (HP), established its largest 
university partnership with NTU, with 
$84 million Singapore dollars in funding 
over four years for digital manufacturing 
technologies. 

We also collaborate with Volvo. We 
converted its electric buses into fully 
self-driving vehicles, which are now being 
piloted on our campus. We're working 
with Singapore's Land Transport Authority 
and other government organizations to 
explore different types of autonomous 
vehicles. 
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We have more than 180 companies from 
around the world on campus. These provide 
opportunities for faculty and students to 
connect research and education to industrial 
practice, and job opportunities for our 
students once they graduate. 


How has NTU's location contributed to its 
growth? 

China and India, the two most populous 
countries in the world, are culturally 
represented in the Singaporean population, 
which attracts Indian and Chinese students 
because of cultural affiliations, geographical 
proximity and the familiarity of living in Asia. 
We have more than 20,000 NTU alumni 
who occupy prominent positions in China 
today. This representation gives us a natural 
connection to China. 

Indonesia and Malaysia are also close 
to us and have historical ties to Singapore. 
Many top US companies have regional 
headquarters in Singapore, including HP and 
Procter & Gamble. This helps us to connect 
with industry. 

English is the primary language of 
Singapore, and our primary language of 
instruction. We have all the practices, 
policies and procedures of a Western 
university. Being located at the crossroads 
of Asia as a multicultural, multiracial society, 
but with a very strong Western focus, makes 
us unique. 


NTU was founded in 1991. How has it grown 
in sucha short amount of time? 

Twenty years ago, the Singapore government 
took a long-term view on the importance of 
having world-class universities here, making 
Singapore a destination for academic, 
industrial, entrepreneurial and innovation 
talent. 

They created new funding to make this 
possible and invested in university buildings, 
labs and facilities. At the same time, the 
Economic Development Board of Singapore 
aimed to attract companies to do high-end 
research. 


What advantages do young universities 
have? 

Six years ago, NTU set up the Lee Kong 
Chian School of Medicine in partnership with 
Imperial College London. We developed 
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More than 180 companies on NTU’s campus 
connect research to industrial practice. 


a modern curriculum using the latest 
technology, online learning and continuous 
assessment. We're using virtual and 
augmented reality to teach subjects such as 
cardiology and anatomy. That’s hard to do 
in an established medical school because 
you have to retrain your medical doctors 
and professors before you can educate your 
students. 

It’s also easier for anew medical school to 
leapfrog old technologies and equipment 
and go straight to the latest ones, while 
older universities have to abandon old labs 
to create new spaces. 


Whatare the challenges? 

Even ina relatively young country such as 
the United States, most of its well-known 
highly ranked universities have been around 
for a hundred years or more. There have 
been many experiments around the world to 
establish new universities, but most have not 
been able to make it into the global top 50 or 
100, even those with lots of funding. 

Many universities whose glory days are in 
the past are still highly ranked. It takes a long 
time for word to get around that you have 
reached your peak. This time lag applies in 
both directions, as it’s very difficult for young 
universities to crack the rankings. But NTU 
has consistently delivered, and now word 
is getting around. This year more than 430 
papers were published by NTU faculty in the 
top ten journals in the world. 

It’s difficult for young universities to 
compete with well-established institutions, 
but they see the value of partnering with 
us. We have strong partnerships with 
Massachusetts Institute of Technology (MIT) 
and Imperial College London, and we have 
a very strong partnership with the Technical 
University of Munich, Germany, in the area of 
robotics. In 2018, the Wallenberg Foundation 
of Sweden gave an endowment to NTU to 
support postdoctoral researchers. 


How do global metrics affect your 
strategies? 

We look at all of them, and at other metrics, 
such as where our faculty publish, the 
quality of the faculty we recruit and where 
they come from. In the past 18 months, for 
example, we've recruited from the University 
of Cambridge in the UK, American lvy 
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League schools and from top institutions in 
Europe and Asia. 

In January 2018, a couple of months 
after | started as president, we launched 
the presidential postdoctoral fellows 
programme to attract the brightest young 
postdocs. This year we had 894 applications 
from 74 countries for 12 positions. 

About a year ago, the US Institute of 
Electrical and Electronics Engineers 
listed the top ten rising stars in artificial 
intelligence around the world. According to 
their assessment, three are NTU faculty. 

Good performance in a ranking can be a 
motivator, but one cannot take it as the only 
metric and the only reason to do well. 


Whatare your priorities from here? 

We are doing our best to attract top talent 
from Singapore and from all over the world. 
This includes students, postdocs, faculty 
and staff. Our commitment to excellence in 
education and research comes second. It’s 
not research versus education. The two have 
to be integrated. To have impact we need 

to make sure that research and education 
connect with both societal and industrial 
impact. That's why government and industry 
partnerships are so important. 

Some demographic trends will affect all of 
the universities in Singapore in the next ten 
to 15 years. Our birth rate has been declining 
for many years, and because funding is tied 
to undergraduate student involvement, 
this decline will affect us all. We have an 
obligation to deliver value for the resources 
we get from the Singapore government. 

As a young university, we had access to 
significant new resources to help grow the 
university, but this upwards trajectory cannot 
be sustained forever. 

We have significant momentum and we 
will continue to grow in stature and impact 
in our output in education, research and 
innovation, but that doesn’t mean that the 
level of annual increase in funding will be 
the same over the next ten years. So, one 
priority we have is to continue our growth in 
excellence without necessarily continuing to 
grow in numbers. 


Interview by Catherine Armitage 
This interview has been edited for clarity and 
length. 
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